🛸 Orbit

Mission control for AI coding agents.

Orbit is the harness your coding agent actually needs — structured loops, real validation gates, rubric-based evaluation, checkpoint resumability, and a full audit trail. Define the mission. Orbit lands it.

↑ Orbit detects a failing test, selects the right task, runs the agent, validates the fix, and marks the orbit complete — no hand-holding required.

The problem with AI coding agents

They drift. They hallucinate changes. They skip validation. They leave no record of what they did or why. You give them a task and cross your fingers.

You:   "Fix the auth bug."
Agent: ✅ Done!
You:   "Did you run the tests?"
Agent: ✅ All passing!
You:   *runs tests* ❌❌❌

Sound familiar?

Orbit fixes that

Every agent run becomes a structured orbit — a bounded, validated, auditable loop. If the agent can't prove its work, the orbit doesn't close.

↑ The Orbit loop: task selection → agent execution → validation gate → rubric evaluation → review → next orbit.

What you get:

🎯 Mission files — human-owned goal, scope, constraints, definition of done
📋 Dependency-aware task queue — priority-sorted, one task per orbit, respects your DAG
🚦 Validation gate — real pytest/lint/typecheck runs; nothing advances without evidence
📐 Rubric evaluator — structured scoring on task focus, completion, change signal, and validation
🔁 Retry policy — auto-retry on failure or validation miss, configurable limits
💾 Checkpoint resumability — durable state; pick up mid-mission without losing progress
💰 Budget management — cap runs, failure count, and estimated cost per session
📡 Full telemetry — every orbit writes agent-result.json, evaluation.json, review.json, and a timestamped progress log

Works with any agent

Orbit is adapter-based. Swap in whatever coding agent you have:

Adapter	Agent
`adapters.claude_cli:ClaudeCliAdapter`	Claude Code (`claude --print`)
`adapters.codex_cli:CodexCliAdapter`	OpenAI Codex CLI
`adapters.cursor_cli:CursorCliAdapter`	Cursor
`adapters.cli_json:CliJsonAdapter`	Any agent that reads stdin and writes JSON
`adapters.mock:MockAgentAdapter`	Local testing / dry runs

What can you do with Orbit?

Use case	How Orbit helps
Nightly bug triage	Point Orbit at your failing CI jobs. It selects the highest-priority fix, validates the patch, and leaves a full artifact trail for review in the morning.
Incremental feature backlog	Break a feature into tasks in `backlog.json`. Orbit works through them in dependency order, one per orbit, never skipping validation.
Automated code health	Schedule Orbit to run overnight on your tech-debt backlog. Cap spend with `max_estimated_cost_usd`. Wake up to a progress log with diffs and scores.
Self-healing repos	Wire Orbit into CI as a bot that auto-fixes known flaky tests, restores broken migrations, or patches lint failures — with full review artifacts before merge.
Dependency upgrades	Model each version bump as a task. Orbit upgrades them in order, runs your full test suite after each one, and halts if validation breaks.
Multi-agent handoffs	Run one orbit with Claude for implementation, a second with Codex for review, and a third with your custom CLI for deployment checks — same harness, swappable adapters.

Quickstart

git clone https://github.com/human-again/orbit
cd orbit
python -m venv .venv && .venv/bin/pip install pytest pillow

Drop three files in your project:

mission.md        ← what to build, and done means done
agent-rules.md    ← safety rules, coding standards, escalation policy
backlog.json      ← tasks with acceptance criteria and dependencies

Then launch:

python orchestrator.py --config runtime/config.json

Orbit selects the next task, hands it to your agent, validates the output, evaluates the result, and loops — until the mission is complete or it needs you.

See it in action

Demo 1 — Auth Rescue

A real auth bug is introduced into the codebase. Orbit selects the right task, runs the agent, turns red tests green, and closes the orbit with a full evaluation artifact.

MOCK=1 ./replay.sh auth-rescue

Demo 2 — Issue Search

A new feature (scoped search) is added to a tiny issues dashboard. Orbit validates the feature with tests and a browser preview before marking it complete.

MOCK=1 ./replay.sh issue-search

Both demos use MOCK=1 — fully deterministic, no API key needed, runs in CI. Swap in a real adapter (DEMO_ADAPTER=claude or DEMO_ADAPTER=codex) to use your agent.

How it works

┌─────────────────────────────────────────────────────────┐
│                      Orbit Loop                         │
│                                                         │
│  mission.md + backlog.json                              │
│        │                                                │
│        ▼                                                │
│  choose_task()  ──── dependency-aware, priority-sorted  │
│        │                                                │
│        ▼                                                │
│  build_task_bundle()  ── mission + rules + context      │
│        │                                                │
│        ▼                                                │
│  adapter.run_agent()  ── your agent, any CLI            │
│        │                                                │
│        ▼                                                │
│  run_validation()  ── real tests, lint, typecheck       │
│        │                                                │
│        ├── FAIL ──► retry_policy ──► loop or stop       │
│        │                                                │
│        ▼                                                │
│  evaluate()  ── rubric scoring → verdict                │
│        │                                                │
│        ▼                                                │
│  review()  ── accept / iterate                          │
│        │                                                │
│        ▼                                                │
│  update backlog + progress + telemetry                  │
│        │                                                │
│        └────────────────────────────► next orbit        │
└─────────────────────────────────────────────────────────┘

Configuration

{
  "adapter": "adapters.claude_cli:ClaudeCliAdapter",
  "adapter_args": {
    "command": "claude",
    "args": ["--print", "--output-format", "json", "--dangerously-skip-permissions"]
  },
  "validation_commands": ["pytest tests/ -v"],
  "max_iterations": 10,
  "run_evaluator": true,
  "run_reviewer": true,
  "require_validation": true,
  "retry_policy": {
    "max_attempts": 2,
    "retry_on_statuses": ["failed"],
    "retry_on_validation_failure": true
  },
  "budgets": {
    "max_runs": 50,
    "max_failures": 5,
    "max_estimated_cost_usd": 10.0
  }
}

Adapter contract

Two methods. That's it.

class MyAdapter(BaseAgentAdapter):
    def prepare_prompt(self, task_bundle: str) -> str: ...
    def run_agent(self, prompt: str, cwd: str, timeout_s: int) -> AgentResult: ...

AgentResult fields: status, changed_files, notes, metadata, raw_output.

Repo layout

orchestrator.py        ← mission control — the main loop
adapters/              ← pluggable agent adapters
evaluator.py           ← rubric scoring
reviewer.py            ← accept/iterate recommendation
validation_runner.py   ← runs test/lint commands
retry_policy.py        ← retry on failure or validation miss
checkpoint_manager.py  ← durable resume state
budget_manager.py      ← run/failure/cost caps
diff_evaluator.py      ← git diff detection
risk_guard.py          ← command classification and gating
observability.py       ← structured telemetry log (events.jsonl)
mission.md             ← your goal, scope, definition of done
agent-rules.md         ← coding rules, escalation rules
backlog.json           ← tasks with acceptance criteria
progress.md            ← append-only mission log
runtime/               ← configs, run artifacts, checkpoints
research/              ← bounded research artifacts per task
examples/              ← end-to-end demos
docs/media/            ← demo GIFs

Advanced

Codex self-heal demo — Orbit uses its own Codex adapter to fix a real Orbit regression inside a temp copy of the repo, then records validation, evaluation, review, and backlog evidence for that orbit.

.venv/bin/python examples/codex-self-heal-demo/run_demo.py

Regenerate the demo GIFs:

scripts/make-demo-media.sh auth-rescue
scripts/make-demo-media.sh issue-search
scripts/make-demo-media.sh orchestrator-pseudocode

⭐ Star it

If Orbit has saved you from an agent that said "tests passing" when they weren't — star it. It helps others find it.

Credits

The agent loop design draws inspiration from Andrej Karpathy's thinking on agentic systems and long-running LLM workflows.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛸 Orbit

The problem with AI coding agents

Orbit fixes that

Works with any agent

What can you do with Orbit?

Quickstart

See it in action

Demo 1 — Auth Rescue

Demo 2 — Issue Search

How it works

Configuration

Adapter contract

Repo layout

Advanced

⭐ Star it

Credits

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
adapters		adapters
docs		docs
eval		eval
examples		examples
missions		missions
prompt_templates		prompt_templates
research		research
runtime		runtime
scripts		scripts
tasks		tasks
templates		templates
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
agent-rules.md		agent-rules.md
backlog.json		backlog.json
budget_manager.py		budget_manager.py
checkpoint_manager.py		checkpoint_manager.py
command_runner.py		command_runner.py
context_manager.py		context_manager.py
dashboard.py		dashboard.py
demo_fixtures.py		demo_fixtures.py
demo_media.py		demo_media.py
demo_replay.py		demo_replay.py
diff_evaluator.py		diff_evaluator.py
evaluator.py		evaluator.py
mission.md		mission.md
observability.py		observability.py
orchestrator.py		orchestrator.py
pr_helper.py		pr_helper.py
progress.md		progress.md
pyproject.toml		pyproject.toml
replay.sh		replay.sh
resume_manager.py		resume_manager.py
retry_policy.py		retry_policy.py
reviewer.py		reviewer.py
risk_guard.py		risk_guard.py
validation_runner.py		validation_runner.py
worktree_manager.py		worktree_manager.py

Folders and files

Latest commit

History

Repository files navigation

🛸 Orbit

The problem with AI coding agents

Orbit fixes that

Works with any agent

What can you do with Orbit?

Quickstart

See it in action

Demo 1 — Auth Rescue

Demo 2 — Issue Search

How it works

Configuration

Adapter contract

Repo layout

Advanced

⭐ Star it

Credits

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages