English | 中文 | 📄 Technical Report:
| 📘 教程 | Sophub
📌 Official channel: This GitHub repository is the sole official source for GenericAgent. We have no affiliation with any third-party website using the GenericAgent name.
GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).
Its design philosophy: don't preload skills — evolve them.
Every time GenericAgent solves a new task, it automatically crystallizes the execution path into an skill for direct reuse later. The longer you use it, the more skills accumulate — forming a skill tree that belongs entirely to you, grown from 3K lines of seed code.
🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running
git initto every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.
- Self-Evolving: Automatically crystallizes each task into an skill. Capabilities grow with every use, forming your personal skill tree.
- Minimal Architecture: ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead.
- Strong Execution: Injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system.
- High Compatibility: Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform.
- Token Efficient: <30K context window — a fraction of the 200K–1M other agents consume. Layered memory ensures the right knowledge is always in scope. Less noise, fewer hallucinations, higher success rate — at a fraction of the cost.
This is what fundamentally distinguishes GenericAgent from every other agent framework.
[New Task] --> [Autonomous Exploration] (install deps, write scripts, debug & verify) -->
[Crystallize Execution Path into skill] --> [Write to Memory Layer] --> [Direct Recall on Next Similar Task]
| What you say | What the agent does the first time | Every time after |
|---|---|---|
| "Read my WeChat messages" | Install deps → reverse DB → write read script → save skill | one-line invoke |
| "Monitor stocks and alert me" | Install mootdx → build selection flow → configure cron → save skill | one-line start |
| "Send this file via Gmail" | Configure OAuth → write send script → save skill | ready to use |
After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.
- 2026-04-21: 📄 Technical Report released on arXiv — GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
- 2026-04-11: Introduced L4 session archive memory and scheduler cron integration
- 2026-03-23: Support personal WeChat as a bot frontend
- 2026-03-10: Released million-scale Skill Library
- 2026-03-08: Released "Dintal Claw" — a GenericAgent-powered government affairs bot
- 2026-03-01: GenericAgent featured by Jiqizhixin (机器之心)
- 2026-01-16: GenericAgent V1.0 public release
This installs GenericAgent with an isolated Python environment and Git, then downloads a ready-to-run package.
Windows PowerShell
powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"Linux / macOS
GLOBAL=1 bash -c "$(curl -fsSL http://fudankw.cn:9000/files/ga_install.sh)"After installation, launch the desktop app from:
frontends/GenericAgent.exe
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]" # Core + UI dependencies
cp mykey_template.py mykey.py # Fill in your LLM API key
python launch.pywGenericAgent is meant to grow its environment through the Agent itself, not by pre-installing every possible package.
Full guide: GETTING_STARTED.md
For one-line installs on Windows, double-click:
frontends/GenericAgent.exe
A lightweight, keyboard-driven interface built on Textual. Supports multiple concurrent sessions and real-time streaming.
python frontends/tuiapp_v2.pypython launch.pywGenericAgent also supports IM frontends such as Telegram, WeChat, QQ, Feishu / Lark, WeCom, and DingTalk.
Typical usage:
python frontends/tgapp.py # Telegram
python frontends/wechatapp.py # WeChat
python frontends/qqapp.py # QQ
python frontends/fsapp.py # Feishu / Lark
python frontends/wecomapp.py # WeCom
python frontends/dingtalkapp.py # DingTalkFor detailed setup, ask GenericAgent itself.
Common chat commands:
/new- start a fresh conversation and clear the current context/continue- list recoverable conversation snapshots/continue N- restore theNth recoverable conversation
| Feature | GenericAgent | OpenClaw | Claude Code |
|---|---|---|---|
| Codebase | ~3K lines | ~530,000 lines | Open-sourced (large) |
| Deployment | pip install + API Key |
Multi-service orchestration | CLI + subscription |
| Browser Control | Real browser (session preserved) | Sandbox / headless browser | Via MCP plugin |
| OS Control | Mouse/kbd, vision, ADB | Multi-agent delegation | File + terminal |
| Self-Evolution | Autonomous skill growth | Plugin ecosystem | Stateless between sessions |
| Out of the Box | A few core files + starter skills | Hundreds of modules | Rich CLI toolset |
📂 Full evaluation datasets and results: https://github.com/JinyiHan99/GA-Technical-Report/tree/main
| Dimension | Question | Benchmarks used |
|---|---|---|
| 1. Task Completion & Token Efficiency | Can GA complete hard tasks more cheaply than leading agents? | SOP-Bench, Lifelong AgentBench, RealFin-Benchmark |
| 2. Tool-Use Efficiency | Can a minimal atomic toolset solve what specialized toolsets solve, with less overhead? | Tool Efficiency Benchmark (11 simple + 5 long-horizon tasks) |
| 3. Memory System Effectiveness | Does condensed hierarchical memory beat full/redundant memory and embedding-based retrievers? | SOP-Bench (dangerous goods), LoCoMo, 20-skill stress test |
| 4. Self-Evolution Capability | Can the agent distill experience into reusable SOPs and code, without intervention? | 9-round LangChain longitudinal study, 8-task cross-task web benchmark |
| 5. Web Browsing Capability | Does density-driven design survive the open web? | WebCanvas, BrowseComp-ZH, Custom Tasks (22) |
Baselines across these dimensions include Claude Code, OpenAI CodeX, and OpenClaw, evaluated under Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.4, and MiniMax M2.7 backbones.
GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.
1️⃣ Layered Memory System
Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.
- L0 — Meta Rules: Core behavioral rules and system constraints of the agent
- L1 — Insight Index: Minimal memory index for fast routing and recall
- L2 — Global Facts: Stable knowledge accumulated over long-term operation
- L3 — Task Skills / SOPs: Reusable workflows for completing specific task types
- L4 — Session Archive: Archived task records distilled from finished sessions for long-horizon recall
2️⃣ Autonomous Execution Loop
Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop
The entire core loop is just ~100 lines of code (agent_loop.py).
3️⃣ Minimal Toolset
GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.
| Tool | Function |
|---|---|
code_run |
Execute arbitrary code |
file_read |
Read files |
file_write |
Write files |
file_patch |
Patch / modify files |
web_scan |
Perceive web content |
web_execute_js |
Control browser behavior |
ask_user |
Human-in-the-loop confirmation |
Additionally, 2 memory management tools (
update_working_checkpoint,start_long_term_update) allow the agent to persist context and accumulate experience across sessions.
4️⃣ Capability Extension Mechanism
Capable of dynamically creating new tools.
Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.
If this project helped you, please consider leaving a Star! 🙏
You're also welcome to join our GenericAgent Community Group for discussion, feedback, and co-building 👏
Thanks for the support from the LinuxDo community!
MIT License — see LICENSE
Disclaimer: This project does not build or operate any commercial website. Apart from DintalClaw, no institution, organization, or individual is currently officially authorized to conduct commercial activities under the GenericAgent name.
GenericAgent 是一个极简、可自我进化的自主 Agent 框架。核心仅 ~3K 行代码,通过 9 个原子工具 + ~100 行 Agent Loop,赋予任意 LLM 对本地计算机的系统级控制能力,覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备。
它的设计哲学是:不预设技能,靠进化获得能力。
每解决一个新任务,GenericAgent 就将执行路径自动固化为 Skill,供后续直接调用。使用时间越长,沉淀的技能越多,形成一棵完全属于你、从 3K 行种子代码生长出来的专属技能树。
🤖 自举实证 — 本仓库的一切,从安装 Git、
git init到每一条 commit message,均由 GenericAgent 自主完成。作者全程未打开过一次终端。
- 自我进化: 每次任务自动沉淀 Skill,能力随使用持续增长,形成专属技能树
- 极简架构: ~3K 行核心代码,Agent Loop 约百行,无复杂依赖,部署零负担
- 强执行力: 注入真实浏览器(保留登录态),9 个原子工具直接接管系统
- 高兼容性: 支持 Claude / Gemini / Kimi / MiniMax 等主流模型,跨平台运行
- 极致省 Token: 上下文窗口不到 30K,是其他 Agent(200K–1M)的零头。分层记忆让关键信息始终在场——噪声更少,幻觉更低,成功率反而更高,而成本低一个数量级。
这是 GenericAgent 区别于其他 Agent 框架的根本所在。
[遇到新任务]-->[自主摸索](安装依赖、编写脚本、调试验证)-->
[将执行路径固化为 Skill]-->[写入记忆层]-->[下次同类任务直接调用]
| 你说的一句话 | Agent 第一次做了什么 | 之后每次 |
|---|---|---|
| "监控股票并提醒我" | 安装 mootdx → 构建选股流程 → 配置定时任务 → 保存 Skill | 一句话启动 |
| "用 Gmail 发这个文件" | 配置 OAuth → 编写发送脚本 → 保存 Skill | 直接可用 |
用几周后,你的 Agent 实例将拥有一套任何人都没有的专属技能树,全部从 3K 行种子代码中生长而来。
| 🧋 外卖下单 | 📈 量化选股 |
|---|---|
![]() |
![]() |
| "Order me a milk tea" — 自动导航外卖 App,选品并完成结账 | "Find GEM stocks with EXPMA golden cross, turnover > 5%" — 量化条件筛股 |
| 🌐 自主网页探索 | 💰 支出追踪 |
![]() |
![]() |
| 自主浏览并定时汇总网页信息 | "查找近 3 个月超 ¥2K 的支出" — 通过 ADB 驱动支付宝 |
- 2026-04-21: 📄 技术报告已发布至 arXiv — GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
- 2026-04-11: 引入 L4 会话归档记忆,并接入 scheduler cron 调度
- 2026-03-23: 支持个人微信接入作为 Bot 前端
- 2026-03-10: 发布百万级 Skill 库
- 2026-03-08: 发布以 GenericAgent 为核心的"政务龙虾" Dintal Claw
- 2026-03-01: GenericAgent 被机器之心报道
- 2026-01-16: GenericAgent V1.0 公开版本发布
一键安装会自动准备独立 Python 环境、Git、项目文件和桌面端,不污染系统环境。
Windows PowerShell
powershell -ExecutionPolicy Bypass -c "irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"Linux / macOS
curl -fsSL http://fudankw.cn:9000/files/ga_install.sh | bash安装完成后,双击启动:
frontends/GenericAgent.exe
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]" # 核心 + UI 依赖
cp mykey_template.py mykey.py # 填入你的 LLM API Key
python launch.pywGenericAgent 更推荐由 Agent 在使用中自举环境,而不是预先手动装完整依赖。
完整引导流程见 GETTING_STARTED.md。
📖 新手使用指南(图文版):飞书文档
📘 完整入门教程(Datawhale 出品):Hello GenericAgent · GitHub
一键安装自带桌面端,双击:
frontends/GenericAgent.exe
基于 Textual 的轻量键盘驱动界面。支持多会话并发、实时流式输出,有终端就能跑。
python frontends/tuiapp_v2.pypython launch.pywGenericAgent 支持 Telegram、微信、QQ、飞书 / Lark、企业微信、钉钉等 IM 前端。
常用启动方式:
python frontends/tgapp.py # Telegram
python frontends/wechatapp.py # 微信
python frontends/qqapp.py # QQ
python frontends/fsapp.py # 飞书 / Lark
python frontends/wecomapp.py # 企业微信
python frontends/dingtalkapp.py # 钉钉详细配置直接问 GenericAgent。
通用聊天命令:
/new- 开启新对话并清空当前上下文/continue- 列出可恢复会话快照/continue N- 恢复第N个可恢复会话
| 特性 | GenericAgent | OpenClaw | Claude Code |
|---|---|---|---|
| 代码量 | ~3K 行 | ~530,000 行 | 已开源(体量大) |
| 部署方式 | pip install + API Key |
多服务编排 | CLI + 订阅 |
| 浏览器控制 | 注入真实浏览器(保留登录态) | 沙箱 / 无头浏览器 | 通过 MCP 插件 |
| OS 控制 | 键鼠、视觉、ADB | 多 Agent 委派 | 文件 + 终端 |
| 自我进化 | 自主生长 Skill 和工具 | 插件生态 | 会话间无状态 |
| 出厂配置 | 几个核心文件 + 少量初始 Skills | 数百模块 | 丰富 CLI 工具集 |
📂 完整的评测数据集以及评测结果见:https://github.com/JinyiHan99/GA-Technical-Report/tree/main
| 维度 | 核心问题 | 使用的基准 |
|---|---|---|
| 1. 任务完成度与 Token 效率 | GA 能否以更低成本完成高难度任务? | SOP-Bench、Lifelong AgentBench、RealFin-Benchmark |
| 2. 工具使用效率 | 最小原子工具集能否以更低开销替代专用工具集? | Tool Efficiency Benchmark |
| 3. 记忆系统有效性 | 精简分层记忆能否超越冗余记忆和基于 Embedding 的检索器? | SOP-Bench、LoCoMo、20-skill 压力测试 |
| 4. 自我进化能力 | Agent 能否在无人干预下将经验提炼为可复用的 SOP 与代码? | 9 轮 LangChain 纵向研究、8 任务跨任务 Web 基准 |
| 5. 网页浏览能力 | 信息密度驱动设计能否适应开放网页? | WebCanvas、BrowseComp-ZH、自定义任务 |
以上维度的基线包括 Claude Code、OpenAI CodeX 和 OpenClaw,分别在 Claude Sonnet 4.6、Claude Opus 4.6、GPT-5.4 和 MiniMax M2.7 底座上进行评测。
![]() 工具使用效率雷达图。GA 在 Token、请求数和工具调用轴上全面领先,同时在四个任务维度上保持质量。 |
![]() 跨任务自我进化。GA 的第二轮和第三轮执行在 8 个 Web 任务上收敛至稳定的低成本区间。 |
GenericAgent 通过分层记忆 × 最小工具集 × 自主执行循环完成复杂任务,并在执行过程中持续积累经验。
1️⃣ 分层记忆系统
记忆在任务执行过程中持续沉淀,使 Agent 逐步形成稳定且高效的工作方式
- L0 — 元规则(Meta Rules):Agent 的基础行为规则和系统约束
- L1 — 记忆索引(Insight Index):极简索引层,用于快速路由与召回
- L2 — 全局事实(Global Facts):在长期运行过程中积累的稳定知识
- L3 — 任务 Skills / SOPs:完成特定任务类型的可复用流程
- L4 — 会话归档(Session Archive):从已完成任务中提炼出的归档记录,用于长程召回
2️⃣ 自主执行循环
感知环境状态 → 任务推理 → 调用工具执行 → 经验写入记忆 → 循环
整个核心循环仅 约百行代码(agent_loop.py)。
3️⃣ 最小工具集
GenericAgent 仅提供 9 个原子工具,构成与外部世界交互的基础能力
| 工具 | 功能 |
|---|---|
code_run |
执行任意代码 |
file_read |
读取文件 |
file_write |
写入文件 |
file_patch |
修改文件 |
web_scan |
感知网页内容 |
web_execute_js |
控制浏览器行为 |
ask_user |
人机协作确认 |
此外,还有 2 个记忆管理工具(
update_working_checkpoint、start_long_term_update),使 Agent 能够跨会话积累经验、维持持久上下文。
4️⃣ 能力扩展机制
具备动态创建新的工具能力
通过 code_run,GenericAgent 可在运行时动态安装 Python 包、编写新脚本、调用外部 API 或控制硬件,将临时能力固化为永久工具。
如果这个项目对您有帮助,欢迎点一个 Star! 🙏
同时也欢迎加入我们的GenericAgent体验交流群,一起交流、反馈和共建 👏
感谢 LinuxDo 社区的支持!
MIT License — 详见 LICENSE
声明:本项目未构建任何商业站点;除 DintalClaw 外,目前未官方授权任何机构、组织或个人以 GenericAgent 名义从事商业活动。








