![]() |
![]() |
bee coding agent harness. Pure Go, single static binary, requires Go 1.26+ to build.
# install (curl | sh)
curl -fsSL https://raw.githubusercontent.com/elhenro/bee/main/install.sh | sh
# or via go install
go install github.com/elhenro/bee/cmd/bee@latest
# or build to your local bin
go build -o ~/.local/bin/bee ./cmd/bee
# use
export OPENROUTER_API_KEY=<your-key>
beeSo far, I've used claude code, codex, hermes, opencode, openclaw and pi. Each one nails something. None of them felt configurable, minimal, or flexible enough for the way I actually work, and a few gaps kept biting me. bee is the harness that just does it.
Three gaps bee closes:
- Tiny-context friendly, tiny footprint. Caveman-compressed system prompt, three tools, top-k memory. Same harness scales from a 4k-context local Ollama up through small fine-tunes to million-token frontier models. Native omlx (Apple Silicon MLX server) and OpenRouter support out of the box. Shrinks itself when context gets tight.
- Skills are
bee <name>subcommands. Write a markdown file, get a command. No shell shims. No REPL incantations.bee criticize plan.mdjust works, from any directory, in any shell. - Skills are agent endpoints. A prompt, an external command, an MCP server, or an HTTP endpoint — all four are equally callable tools the model can invoke mid-task. Plug a personal-life agent in as a sub-agent (bundled
hermes.mdis a template — edit theexec:line). No IPC dance.
$ bee criticize plan.md # one binary, every skill a subcommand
$ bee run "lint cmd/" # headless, pipeable
$ bee swarm "migrate auth to jwt" # queen + workers
$ bee fan "audit internal/ for cleanup" # parallel fan-out~/.bee/skills/*.md is your library. Add one, it shows up. First run seeds defaults. Edit one, it lives.
~/.bee/config.toml, sane defaults, set an API key, change models.
bee runs against any OpenAI-compatible local server. Confirmed working:
- omlx (Apple Silicon MLX server,
localhost:8000/v1) with MLX-quantized coder models — strong tool-calling, low RAM footprint. - Ollama (
localhost:11434/v1) withllama3.1:8b,qwen2.5-coder:7b. - LM Studio (
localhost:1234/v1).
For sub-8k-context models, switch to the tiny profile. --profile is not a CLI flag — set it via env or ~/.bee/config.toml:
BEE_PROFILE=tiny bee run --provider omlx --model Qwen3.6-35B-A3B-4bit -- "..."
# or persist in ~/.bee/config.toml
profile = "tiny"
default_provider = "omlx"
default_model = "Qwen3.6-35B-A3B-4bit"
Token-compression rules injected into the system prompt. On by default. caveman = "auto" resolves per profile: full on tiny and normal, lite on large.
Force a level regardless of profile:
bee --caveman full # global, any subcommand
bee run --caveman full -- "..." # one-off
# or set caveman = "full" in ~/.bee/config.toml
Disable:
bee --caveman off
# or set caveman = "off" in ~/.bee/config.toml
Explicit value beats profile.
Hand bee an objective and a budget, walk away, wake up to a branch full of small individually-revertable commits. Each iteration runs one focused change and either commits or git reset --hard rolls back on failure — the working tree never stays dirty. A live TUI shows the iteration ledger (🐝 foraging · 🌼 committed · 🍃 noop · 🥀 reset · 💥 failed), token cost, and a sleeping bee at the bottom. Type to nudge the run mid-flight; /stop exits gracefully after the current iteration, /abort cancels immediately.
bee zzz "tighten error messages across internal/tools" --max-iterations 30
bee zzz --list # past runs
bee zzz --resume # pick up the most recent
bee zzz --worktree "..." # isolate in ~/.bee/zzz/worktrees/<id>
bee zzz --plain # disable the TUI (stdin steering still works)
bee zzz --max-consecutive-fails 5 # tolerance for transient agent stalls
bee zzz --hard-error-retries 5 # retries per iter on provider 5xx etc
bee zzz --notes-tail 5 # cap prior-iter sections echoed into prompts
bee zzz --gc --gc-worktrees # prune old runs and their worktrees
bee zzz --gc --gc-stale-running 168h # also reap runs stuck "running" >7d
Artifacts live in ~/.bee/zzz/runs/<id>/: notes.md per-iter summaries, events.jsonl raw timeline, meta.json run state, blocked-<iter>.patch for any iteration that emitted BLOCKED:. Inspired by gnhf.
Trust model. Commits land unsigned by default, use --sign to opt back in. Pre-commit hooks run unless --no-verify is set. Pushing (--push) is opt-in, and failures are recorded in meta.json so you can tell what reached the remote. --yes/--yolo auto-approves dangerous shell commands, so pair it with --worktree to contain the blast radius. The CLI prints a warning when --yes is used without --worktree.
Spawn many bees at once, each on its own git worktree. The overview reuses the chat input — every submitted message starts a fresh detached agent under ~/.bee/agents/worktrees/<id> on branch agents/<short>. Rows show initial prompt, elapsed, tokens up/down, last thought, locked model. j/k/arrows navigate, l/→/enter opens an agent fullscreen (existing session view), m retries a merge.
bee agents
When an agent ends its turn with DONE: <summary> the coordinator rebases its branch onto main and fast-forwards. Conflicts post a resolution prompt back to the agent via the inbox and flip its row to needs input; auto-retry every 10s or hit m to force a retry. Unmerged worktrees stay highlighted in red until they land.
/model <name> and /provider <name> set the model used for the next spawn (sticky until changed) — mix and match across agents. Killing bee agents does not kill the running children; relaunching reconstructs the overview from ~/.bee/sessions/bg/.
Mac M3 Max (64 GB) -> omlx with an MLX-quantized coder model. Runs fast, handles small tasks reliably, doesn't choke on context. Good enough for day-to-day. Local, private, free once the hardware is paid for.
- macOS / Linux — first-class. Static binaries published for
darwin/{amd64,arm64}andlinux/{amd64,arm64}. - Windows — best-effort. The native build runs; the sandbox layer is a stub that re-dispatches through WSL2. Run under WSL2 for production use.
The chatgpt provider lets you drive bee with a ChatGPT Plus/Pro/Team subscription via the chatgpt.com Codex backend instead of paying per-token API billing. This reuses a public client_id that targets a first-party OpenAI surface. OpenAI's terms restrict that surface to their own clients — usage may be rate-limited per plan tier and the path can be revoked at any time. Treat this provider as experimental. Use OPENROUTER_API_KEY (or any other provider) for anything you don't want to lose access to. Run /login chatgpt to drive the PKCE flow; the command surfaces this same warning.
Caveman prompt-compression rules adapted from JuliusBrussee/caveman.

