GitHub - lsdefine/GenericAgent: Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption

English | 中文 | 📄 Technical Report: | 📘 教程 | Sophub

📌 Official channel: This GitHub repository is the sole official source for GenericAgent. We have no affiliation with any third-party website using the GenericAgent name.

🌟 Overview

GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).

Its design philosophy: don't preload skills — evolve them.

Every time GenericAgent solves a new task, it automatically crystallizes the execution path into an skill for direct reuse later. The longer you use it, the more skills accumulate — forming a skill tree that belongs entirely to you, grown from 3K lines of seed code.

🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running git init to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.

📋 Core Features

Self-Evolving: Automatically crystallizes each task into an skill. Capabilities grow with every use, forming your personal skill tree.
Minimal Architecture: ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead.
Strong Execution: Injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system.
High Compatibility: Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform.
Token Efficient: <30K context window — a fraction of the 200K–1M other agents consume. Layered memory ensures the right knowledge is always in scope. Less noise, fewer hallucinations, higher success rate — at a fraction of the cost.

🧬 Self-Evolution Mechanism

This is what fundamentally distinguishes GenericAgent from every other agent framework.

[New Task] --> [Autonomous Exploration] (install deps, write scripts, debug & verify) -->
[Crystallize Execution Path into skill] --> [Write to Memory Layer] --> [Direct Recall on Next Similar Task]

What you say	What the agent does the first time	Every time after
"Read my WeChat messages"	Install deps → reverse DB → write read script → save skill	one-line invoke
"Monitor stocks and alert me"	Install mootdx → build selection flow → configure cron → save skill	one-line start
"Send this file via Gmail"	Configure OAuth → write send script → save skill	ready to use

After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.

🎯 Demo Showcase

🧋 Food Delivery Order	📈 Quantitative Stock Screening

"Order me a milk tea" — Navigates the delivery app, selects items, and completes checkout automatically.	"Find GEM stocks with EXPMA golden cross, turnover > 5%" — Screens stocks with quantitative conditions.
🌐 Autonomous Web Exploration	💰 Expense Tracking

Autonomously browses and periodically summarizes web content.	"Find expenses over ¥2K in the last 3 months" — Drives Alipay via ADB.

📅 Latest News

2026-04-21: 📄 Technical Report released on arXiv — GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
2026-04-11: Introduced L4 session archive memory and scheduler cron integration
2026-03-23: Support personal WeChat as a bot frontend
2026-03-10: Released million-scale Skill Library
2026-03-08: Released "Dintal Claw" — a GenericAgent-powered government affairs bot
2026-03-01: GenericAgent featured by Jiqizhixin (机器之心)
2026-01-16: GenericAgent V1.0 public release

🚀 Quick Start

Method 1: One-line install (recommended)

This installs GenericAgent with an isolated Python environment and Git, then downloads a ready-to-run package.

Windows PowerShell

powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"

Linux / macOS

GLOBAL=1 bash -c "$(curl -fsSL http://fudankw.cn:9000/files/ga_install.sh)"

After installation, launch the desktop app from:

frontends/GenericAgent.exe

Method 2: Python install (for developers)

git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]"        # Core + UI dependencies
cp mykey_template.py mykey.py     # Fill in your LLM API key
python launch.pyw

GenericAgent is meant to grow its environment through the Agent itself, not by pre-installing every possible package.

Full guide: GETTING_STARTED.md

🖥️ Frontends

Desktop App

For one-line installs on Windows, double-click:

frontends/GenericAgent.exe

Terminal UI

A lightweight, keyboard-driven interface built on Textual. Supports multiple concurrent sessions and real-time streaming.

python frontends/tuiapp_v2.py

Streamlit UI

python launch.pyw

💬 Bot Interface (IM)

GenericAgent also supports IM frontends such as Telegram, WeChat, QQ, Feishu / Lark, WeCom, and DingTalk.

Typical usage:

python frontends/tgapp.py        # Telegram
python frontends/wechatapp.py    # WeChat
python frontends/qqapp.py        # QQ
python frontends/fsapp.py        # Feishu / Lark
python frontends/wecomapp.py     # WeCom
python frontends/dingtalkapp.py  # DingTalk

For detailed setup, ask GenericAgent itself.

Common chat commands:

/new - start a fresh conversation and clear the current context
/continue - list recoverable conversation snapshots
/continue N - restore the Nth recoverable conversation

📊 Comparison with Similar Tools

Feature	GenericAgent	OpenClaw	Claude Code
Codebase	~3K lines	~530,000 lines	Open-sourced (large)
Deployment	`pip install` + API Key	Multi-service orchestration	CLI + subscription
Browser Control	Real browser (session preserved)	Sandbox / headless browser	Via MCP plugin
OS Control	Mouse/kbd, vision, ADB	Multi-agent delegation	File + terminal
Self-Evolution	Autonomous skill growth	Plugin ecosystem	Stateless between sessions
Out of the Box	A few core files + starter skills	Hundreds of modules	Rich CLI toolset

📈 Evaluation — Five Dimensions

📂 Full evaluation datasets and results: https://github.com/JinyiHan99/GA-Technical-Report/tree/main

Dimension	Question	Benchmarks used
1. Task Completion & Token Efficiency	Can GA complete hard tasks more cheaply than leading agents?	SOP-Bench, Lifelong AgentBench, RealFin-Benchmark
2. Tool-Use Efficiency	Can a minimal atomic toolset solve what specialized toolsets solve, with less overhead?	Tool Efficiency Benchmark (11 simple + 5 long-horizon tasks)
3. Memory System Effectiveness	Does condensed hierarchical memory beat full/redundant memory and embedding-based retrievers?	SOP-Bench (dangerous goods), LoCoMo, 20-skill stress test
4. Self-Evolution Capability	Can the agent distill experience into reusable SOPs and code, without intervention?	9-round LangChain longitudinal study, 8-task cross-task web benchmark
5. Web Browsing Capability	Does density-driven design survive the open web?	WebCanvas, BrowseComp-ZH, Custom Tasks (22)

Baselines across these dimensions include Claude Code, OpenAI CodeX, and OpenClaw, evaluated under Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.4, and MiniMax M2.7 backbones.

_{Tool-use efficiency radar. GA dominates token, request, and tool-call axes while preserving quality across four task dimensions.}

_{Cross-task self-evolution. Second- and third-run GA executions converge to a stable low-cost regime across eight web tasks, while OpenClaw shows no such convergence.}

🧠 How It Works

GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.

1️⃣ Layered Memory System

Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.

L0 — Meta Rules: Core behavioral rules and system constraints of the agent
L1 — Insight Index: Minimal memory index for fast routing and recall
L2 — Global Facts: Stable knowledge accumulated over long-term operation
L3 — Task Skills / SOPs: Reusable workflows for completing specific task types
L4 — Session Archive: Archived task records distilled from finished sessions for long-horizon recall

2️⃣ Autonomous Execution Loop

Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop

The entire core loop is just ~100 lines of code (agent_loop.py).

3️⃣ Minimal Toolset

GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.

Tool	Function
`code_run`	Execute arbitrary code
`file_read`	Read files
`file_write`	Write files
`file_patch`	Patch / modify files
`web_scan`	Perceive web content
`web_execute_js`	Control browser behavior
`ask_user`	Human-in-the-loop confirmation

Additionally, 2 memory management tools (update_working_checkpoint, start_long_term_update) allow the agent to persist context and accumulate experience across sessions.

4️⃣ Capability Extension Mechanism

Capable of dynamically creating new tools.

Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

GenericAgent Workflow Diagram

⭐ Support

If this project helped you, please consider leaving a Star! 🙏

You're also welcome to join our GenericAgent Community Group for discussion, feedback, and co-building 👏

WeChat Group 18

🚩 Friendly Links

Thanks for the support from the LinuxDo community!

📄 License

MIT License — see LICENSE

Disclaimer: This project does not build or operate any commercial website. Apart from DintalClaw, no institution, organization, or individual is currently officially authorized to conduct commercial activities under the GenericAgent name.

🌟 项目简介

GenericAgent 是一个极简、可自我进化的自主 Agent 框架。核心仅 ~3K 行代码，通过 9 个原子工具 + ~100 行 Agent Loop，赋予任意 LLM 对本地计算机的系统级控制能力，覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备。

它的设计哲学是：不预设技能，靠进化获得能力。

每解决一个新任务，GenericAgent 就将执行路径自动固化为 Skill，供后续直接调用。使用时间越长，沉淀的技能越多，形成一棵完全属于你、从 3K 行种子代码生长出来的专属技能树。

🤖 自举实证 — 本仓库的一切，从安装 Git、git init 到每一条 commit message，均由 GenericAgent 自主完成。作者全程未打开过一次终端。

📋 核心特性

自我进化: 每次任务自动沉淀 Skill，能力随使用持续增长，形成专属技能树
极简架构: ~3K 行核心代码，Agent Loop 约百行，无复杂依赖，部署零负担
强执行力: 注入真实浏览器（保留登录态），9 个原子工具直接接管系统
高兼容性: 支持 Claude / Gemini / Kimi / MiniMax 等主流模型，跨平台运行
极致省 Token: 上下文窗口不到 30K，是其他 Agent（200K–1M）的零头。分层记忆让关键信息始终在场——噪声更少，幻觉更低，成功率反而更高，而成本低一个数量级。

🧬 自我进化机制

这是 GenericAgent 区别于其他 Agent 框架的根本所在。

[遇到新任务]-->[自主摸索](安装依赖、编写脚本、调试验证)-->
[将执行路径固化为 Skill]-->[写入记忆层]-->[下次同类任务直接调用]

你说的一句话	Agent 第一次做了什么	之后每次
"监控股票并提醒我"	安装 mootdx → 构建选股流程 → 配置定时任务 → 保存 Skill	一句话启动
"用 Gmail 发这个文件"	配置 OAuth → 编写发送脚本 → 保存 Skill	直接可用

用几周后，你的 Agent 实例将拥有一套任何人都没有的专属技能树，全部从 3K 行种子代码中生长而来。

🎯 实例展示

🧋 外卖下单	📈 量化选股

"Order me a milk tea" — 自动导航外卖 App，选品并完成结账	"Find GEM stocks with EXPMA golden cross, turnover > 5%" — 量化条件筛股
🌐 自主网页探索	💰 支出追踪

自主浏览并定时汇总网页信息	"查找近 3 个月超 ¥2K 的支出" — 通过 ADB 驱动支付宝

📅 最新动态

2026-04-21: 📄 技术报告已发布至 arXiv — GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
2026-04-11: 引入 L4 会话归档记忆，并接入 scheduler cron 调度
2026-03-23: 支持个人微信接入作为 Bot 前端
2026-03-10: 发布百万级 Skill 库
2026-03-08: 发布以 GenericAgent 为核心的"政务龙虾" Dintal Claw
2026-03-01: GenericAgent 被机器之心报道
2026-01-16: GenericAgent V1.0 公开版本发布

🚀 快速开始

方法一：一键安装（推荐）

一键安装会自动准备独立 Python 环境、Git、项目文件和桌面端，不污染系统环境。

Windows PowerShell

powershell -ExecutionPolicy Bypass -c "irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"

Linux / macOS

curl -fsSL http://fudankw.cn:9000/files/ga_install.sh | bash

安装完成后，双击启动：

frontends/GenericAgent.exe

方法二：Python 安装（开发者）

git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]"        # 核心 + UI 依赖
cp mykey_template.py mykey.py     # 填入你的 LLM API Key
python launch.pyw

GenericAgent 更推荐由 Agent 在使用中自举环境，而不是预先手动装完整依赖。

完整引导流程见 GETTING_STARTED.md。

📖 新手使用指南（图文版）：飞书文档

📘 完整入门教程（Datawhale 出品）：Hello GenericAgent · GitHub

🖥️ 前端启动方式

桌面端

一键安装自带桌面端，双击：

frontends/GenericAgent.exe

终端 UI

基于 Textual 的轻量键盘驱动界面。支持多会话并发、实时流式输出，有终端就能跑。

python frontends/tuiapp_v2.py

Streamlit UI

python launch.pyw

💬 Bot 接口（IM）

GenericAgent 支持 Telegram、微信、QQ、飞书 / Lark、企业微信、钉钉等 IM 前端。

常用启动方式：

python frontends/tgapp.py        # Telegram
python frontends/wechatapp.py    # 微信
python frontends/qqapp.py        # QQ
python frontends/fsapp.py        # 飞书 / Lark
python frontends/wecomapp.py     # 企业微信
python frontends/dingtalkapp.py  # 钉钉

详细配置直接问 GenericAgent。

通用聊天命令：

/new - 开启新对话并清空当前上下文
/continue - 列出可恢复会话快照
/continue N - 恢复第 N 个可恢复会话

📊 与同类产品对比

特性	GenericAgent	OpenClaw	Claude Code
代码量	~3K 行	~530,000 行	已开源（体量大）
部署方式	`pip install` + API Key	多服务编排	CLI + 订阅
浏览器控制	注入真实浏览器（保留登录态）	沙箱 / 无头浏览器	通过 MCP 插件
OS 控制	键鼠、视觉、ADB	多 Agent 委派	文件 + 终端
自我进化	自主生长 Skill 和工具	插件生态	会话间无状态
出厂配置	几个核心文件 + 少量初始 Skills	数百模块	丰富 CLI 工具集

📈 评测 — 五大维度

📂 完整的评测数据集以及评测结果见：https://github.com/JinyiHan99/GA-Technical-Report/tree/main

维度	核心问题	使用的基准
1. 任务完成度与 Token 效率	GA 能否以更低成本完成高难度任务？	SOP-Bench、Lifelong AgentBench、RealFin-Benchmark
2. 工具使用效率	最小原子工具集能否以更低开销替代专用工具集？	Tool Efficiency Benchmark
3. 记忆系统有效性	精简分层记忆能否超越冗余记忆和基于 Embedding 的检索器？	SOP-Bench、LoCoMo、20-skill 压力测试
4. 自我进化能力	Agent 能否在无人干预下将经验提炼为可复用的 SOP 与代码？	9 轮 LangChain 纵向研究、8 任务跨任务 Web 基准
5. 网页浏览能力	信息密度驱动设计能否适应开放网页？	WebCanvas、BrowseComp-ZH、自定义任务

以上维度的基线包括 Claude Code、OpenAI CodeX 和 OpenClaw，分别在 Claude Sonnet 4.6、Claude Opus 4.6、GPT-5.4 和 MiniMax M2.7 底座上进行评测。

_{工具使用效率雷达图。GA 在 Token、请求数和工具调用轴上全面领先，同时在四个任务维度上保持质量。}

_{跨任务自我进化。GA 的第二轮和第三轮执行在 8 个 Web 任务上收敛至稳定的低成本区间。}

🧠 工作机制

GenericAgent 通过分层记忆 × 最小工具集 × 自主执行循环完成复杂任务，并在执行过程中持续积累经验。

1️⃣ 分层记忆系统

记忆在任务执行过程中持续沉淀，使 Agent 逐步形成稳定且高效的工作方式

L0 — 元规则（Meta Rules）：Agent 的基础行为规则和系统约束
L1 — 记忆索引（Insight Index）：极简索引层，用于快速路由与召回
L2 — 全局事实（Global Facts）：在长期运行过程中积累的稳定知识
L3 — 任务 Skills / SOPs：完成特定任务类型的可复用流程
L4 — 会话归档（Session Archive）：从已完成任务中提炼出的归档记录，用于长程召回

2️⃣ 自主执行循环

感知环境状态 → 任务推理 → 调用工具执行 → 经验写入记忆 → 循环

整个核心循环仅 约百行代码（agent_loop.py）。

3️⃣ 最小工具集

GenericAgent 仅提供 9 个原子工具，构成与外部世界交互的基础能力

工具	功能
`code_run`	执行任意代码
`file_read`	读取文件
`file_write`	写入文件
`file_patch`	修改文件
`web_scan`	感知网页内容
`web_execute_js`	控制浏览器行为
`ask_user`	人机协作确认

此外，还有 2 个记忆管理工具（update_working_checkpoint、start_long_term_update），使 Agent 能够跨会话积累经验、维持持久上下文。

4️⃣ 能力扩展机制

具备动态创建新的工具能力

通过 code_run，GenericAgent 可在运行时动态安装 Python 包、编写新脚本、调用外部 API 或控制硬件，将临时能力固化为永久工具。

GenericAgent 工作流程图

⭐ 支持

如果这个项目对您有帮助，欢迎点一个 Star! 🙏

同时也欢迎加入我们的GenericAgent体验交流群，一起交流、反馈和共建 👏

微信群 18

🚩 友情链接

感谢 LinuxDo 社区的支持！

📄 许可

MIT License — 详见 LICENSE

声明：本项目未构建任何商业站点；除 DintalClaw 外，目前未官方授权任何机构、组织或个人以 GenericAgent 名义从事商业活动。

Name		Name	Last commit message	Last commit date
Latest commit History 620 Commits
assets		assets
frontends		frontends
ga_cli		ga_cli
memory		memory
plugins		plugins
reflect		reflect
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
TMWebDriver.py		TMWebDriver.py
agent_loop.py		agent_loop.py
agentmain.py		agentmain.py
ga		ga
ga.cmd		ga.cmd
ga.py		ga.py
hub.pyw		hub.pyw
launch.pyw		launch.pyw
llmcore.py		llmcore.py
mykey_template.py		mykey_template.py
mykey_template_en.py		mykey_template_en.py
pyproject.toml		pyproject.toml
simphtml.py		simphtml.py

Folders and files

Latest commit

History

Repository files navigation

🌟 Overview

📋 Core Features

🧬 Self-Evolution Mechanism

🎯 Demo Showcase

📅 Latest News

🚀 Quick Start

Method 1: One-line install (recommended)

Method 2: Python install (for developers)

🖥️ Frontends

Desktop App

Terminal UI

Streamlit UI

💬 Bot Interface (IM)

📊 Comparison with Similar Tools

📈 Evaluation — Five Dimensions

🧠 How It Works

⭐ Support

🚩 Friendly Links

📄 License

🌟 项目简介

📋 核心特性

🧬 自我进化机制

🎯 实例展示

📅 最新动态

🚀 快速开始

方法一：一键安装（推荐）

方法二：Python 安装（开发者）

🖥️ 前端启动方式

桌面端

终端 UI

Streamlit UI

💬 Bot 接口（IM）

📊 与同类产品对比

📈 评测 — 五大维度

🧠 工作机制

⭐ 支持

🚩 友情链接

📄 许可

📈 Star History

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages