A self-adaptive runtime substrate for AI coding agents — hexagonal architecture, evidence-gated state, adversarial governance, and a closed-loop control system that observes, plans, judges, and applies changes to itself within a bounded autonomy envelope.
Most AI coding tools are interactive assistants. hex is a self-adaptive runtime substrate that hosts those tools (Claude Code, Aider, Cursor, local Ollama agents) inside a governed execution model. The model — frontier API, local Ollama, or anything else with an inference port — is hot-swappable. The architecture, audit trail, trust loop, and self-improvement machinery stay constant. The runtime watches itself work, decides when its own design needs to change, and applies those changes through the same gates it uses for application code.
Three claims define hex:
-
Architecture is enforced, not encouraged. Tree-sitter parses every commit; cross-layer imports fail the build.
hex-core(zero external deps),ports/,adapters/,usecases/— the seams are named, machine-checked, and runtime-swappable through a substrate composition root (ADR-2604261303, ADR-2604261500). -
Completion is derived from evidence, not self-reported. Agents claim done; hex doesn't believe them. Every task has a file-evidence gate (
hex-nexus/src/orchestration/workplan_executor.rs::check_evidence_gate) and a workplan-scoped commit-subject reconciler (hex plan reconcile --strict). Lying agents get markedfailed; a P1 inbox notification fires (ADR-2604270800, ADR-2604271000). -
The system adapts itself within a bounded autonomy envelope. A control loop ticks every 30s observing the running system, plans changes through adversarial competition, judges them against a structured rubric, and applies them via shadow-promotion. Tier-A changes auto-merge; Tier-C halts at the operator. The loop's targets include hex's own ADRs, workplans, port telemetry, and architectural design (ADR-2604261311, ADR-2604271100, ADR-2604271200).
Work is tracked as workplan JSON (phases, tasks, adapter boundaries, gates). Architecture decisions are tracked as ADRs (189 in tree). State lives in SpacetimeDB as an append-only event log. Every coupling has a name, every mutation has a record.
The substrate ADRs (2604261500, 2604261311, 2604261800, 2604262100) describe hex as "the runtime substrate that hosts applications which rewrite themselves under LLM supervision." That isn't marketing — it's a working MAPE-K control loop with concrete code paths for each phase:
| MAPE-K phase | What it does | Where it lives | Status |
|---|---|---|---|
| Monitor | Read telemetry, ADR registry, workplan state, git, inbox, RL scores, port latency. ~20 detectors as TOML rules. | hex-cli/assets/improver/detectors.toml; hex analyze; PortTelemetry STDB rollups |
detector vocabulary scaffolded, telemetry rollup in flight |
| Analyze | Each detector emits Hypothesis { id, source, scope, severity, evidence }; deduped by (source, scope). |
hex-cli/src/commands/sched/improver/discover.rs |
scaffolded; full detector wiring in wp-architectural-health-detectors |
| Plan | Adversarial-swarm spawns N=3 strategic variants per hypothesis at T2.5 (devstral-small-2): "conservative refactor," "aggressive redesign," "minimum viable patch." | hex-nexus/src/orchestration/adversarial_swarm.rs::propose_strategic |
propose_strategic in wp-sched-improver P2 |
| Execute | Structured judge scores variants on 5 axes (alignment, blast-radius, dependency-satisfaction, reversibility, historical-reject-rate); winner is applied via shadow-promotion on a sched/improver/<id> worktree branch; losers archived to docs/workplans/rejected/. |
hex-nexus/src/orchestration/improver_judge.rs + improver_act.rs |
wp-sched-improver P3+P4 |
| Knowledge | All transitions append to improver_event STDB rows: hypothesis text, variants, verdict, action taken, outcome. The judge consults this history (historical_reject_rate axis) so the system learns which variant patterns the operator tends to overrule. |
STDB improver_event table |
wp-sched-improver P5 |
The autonomy envelope (ADR-2604270800 §1a) is a three-tier table that names exactly which actions the loop may apply without operator consent:
| Tier | Examples | Auto? | Rollback envelope |
|---|---|---|---|
| A | Status-frontmatter regex rewrite; trailing whitespace; missing newline; new ADR/workplan in docs/; enqueue workplan |
yes — shadow-promote → hex worktree merge |
dedicated branch, single git revert |
| B | Mutate existing accepted ADR's status; restore broken cross-link | draft only — diff written, P2 inbox for human merge | branch persists until human acts |
| C | Modify code outside docs/; delete files; mutate two-or-more accepted ADRs at once |
never auto — P1 inbox notification | manual review only |
This is the structural answer to "agents wreck repos when given autonomy." hex doesn't trust autonomy; it bounds it.
The improver doesn't only watch application code. It watches itself:
hex adr doctorscans the ADR registry every tick. Unparseable status → Tier-A auto-fix. Duplicate ID → Tier-C operator review. Stale-Proposed ADR → Tier-B drafted demotion. (ADR-2604270800)hex plan reconcile --strictdemotes any task whose storeddonedoesn't match the event log. Removes the multi-writer race that produces false-completes. (ADR-2604271000)hex substrate telemetryrollups detect port latency drift, adapter skew, traffic concentration, idle adapters, swap starvation — the substrate auditing whether its own hot-swap machinery is paying its way. (ADR-2604271200)- Architectural detectors find god-domain-types, kitchen-sink ports, orphan adapters, dead layers, composition drift — design-quality findings, not just compile errors.
The composition root (ADR-2604261303 + cookbook ADR-2604262100) is itself a runtime artifact that the loop can rewrite — adapter swaps go through the same shadow-promotion pipeline application code does.
Most alternatives sit somewhere between "smart autocomplete" and "black-box autonomy." hex covers the gap.
| Category | Examples | What they give you | What hex adds |
|---|---|---|---|
| Per-prompt assistant | Claude Code, Cursor, Copilot | one-shot suggestions, conversational repair | continuous tick loop; evidence-gated state; self-correcting reconciler |
| Git-aware single-shot agent | Aider | edits scoped to a commit | tier routing across model sizes; adversarial best-of-N; architectural import enforcement |
| Black-box autonomy | Devin, AutoGPT, Open Interpreter | "go figure it out" | append-only event log per state transition; shadow-promotion before any swap; explicit autonomy tier table (A/B/C) |
| Local IDE agent | OpenCode, Cline, Continue | local model integration | hexagonal substrate hosts them as adapters behind one port; cross-tool governance |
| Multi-agent orchestrator | AutoGen, CrewAI, LangGraph | task graph + role prompts | structurally enforced hexagonal layout; six-layer adversarial governance; shadow-promote for swaps |
| MCP server / tool surface | hundreds | tool calls into a model | hex is an MCP server and the runtime that gates which servers are allowed |
The positioning that makes everything click: hex is a Linux kernel for coding agents, not a fancier user-space tool. Models, CLIs, IDEs, and orchestrators are processes that run inside hex's address space. They get scheduled, audited, sandboxed, hot-swapped — and the design of their execution is itself rewritable from inside the runtime.
| Capability | Owned mechanism | Why nothing else has it |
|---|---|---|
| Compile gate on agent output | best-of-N + language-specific validation (auto-detected: Rust → cargo check, TS → tsc --noEmit, Go → go build); failed candidates feed back into the next attempt |
Most agents trust the model. hex doesn't. |
| Layer-boundary enforcement at commit | tree-sitter scan in hex analyze; pre-commit hook; CI gate |
Hexagonal rules without enforcement aren't rules. |
Evidence-gated done |
files-exist + workplan-scoped commit-subject match (ADR-2604270800 P0) | Other systems store status: "done" and trust the writer. hex demotes any task without git evidence. |
| Adversarial governance for changes | adversarial-swarm proposes 3 variants; structured judge with 5-axis rubric; shadow-promote (ADR-2604261311) | Single-shot LLM proposals are biased. hex makes them compete. |
| Continuous self-improvement | sched-daemon tick_improver discovers → proposes → judges → enqueues; ~20 detectors across operational + architectural classes (ADR-2604271100, ADR-2604271200) |
Most "agentic" systems run a loop on user prompts. hex runs a loop without one. |
| Architectural-health interrogation | god-types, port cohesion, adapter skew, latency drift, swap-starvation, composition drift — all become hypotheses | Linters check syntax. hex's improver checks whether the design is paying its way. |
| Tier-routed local-first inference | T1 4B / T2 32B / T2.5 24B / T3 frontier; strategy_hint selects; compile gate validates |
Pricing-driven routing without a verifier produces worse code. hex pairs them. |
| Bounded-autonomy state mutation | Tier A: auto-apply via shadow-promote on a sched/auto-fix branch; Tier B: write fix, P2 inbox; Tier C: P1 inbox, no action | Agents that mutate without rollback envelopes wreck repos. hex's mutations are addressable for git revert. |
| Provider-agnostic inference | one IInferencePort, multiple adapters (Anthropic, OpenAI, Ollama, OpenRouter); secret-grant via STDB |
Tool lock-in is real. hex sees every provider as an adapter. |
| Standalone (Claude-Code-free) operation | AgentManager + OllamaInferenceAdapter engaged when CLAUDE_SESSION_ID is unset; hex doctor composition reports active variant |
Most agentic systems hard-depend on a frontier API. hex runs on a laptop. |
The complete execution pipeline from operator prompt to self-improvement:
- Operator prompt or improver-emitted hypothesis
- classify_work_intent → tier routing (T1/T2/T2.5/T3)
- spec → workplan (JSON, behavioral, machine-checked)
- hex plan execute → HexFlo swarm dispatches per adapter
- Agent in worktree feat/<wp>/<layer> (isolated, parallel)
- Best-of-N inference → compile gate blocks failed attempts (language auto-detected from project manifest)
- Evidence gate → every task.file exists OR commit subject mentions task+wp
- Judge → behavioral spec passes; rubric scores ≥ confidence threshold
- hex worktree merge → NEVER raw checkout (ADR-2604131930)
- hex plan reconcile --strict → append-only event log; status derived
- Sched tick loops → ADR-doctor, improver detectors, swarm-cleanup
- Improver discovers next hypothesis → back to top (MAPE-K loop)
Every arrow is an event row. Every state transition is recorded.
Operator's role: kill-switch + judge-rubric tuning, not per-decision approval.
hex is designed to drop into existing projects with zero breaking changes:
# 1. Add hex-core as a dependency
cargo add hex-core
# 2. Bootstrap the runtime (one command)
hex bootstrap --profile dev
# 3. Start using hex commands
hex analyze . # Check architecture boundaries
hex plan draft "add auth" # Create workplan stub
hex plan execute <plan> # Run autonomous feature workNo configuration required. hex reads your workspace structure and starts enforcing rules immediately. The bootstrap command handles all infrastructure (SpacetimeDB, Ollama models, GPU setup).
Tested on: macOS (Intel/ARM), Linux (x86_64, GPU), Docker. Setup time: ~2 minutes start-to-ready (vs. 45 minutes manual setup).
docker run -d --name hex \
-p 5555:5555 -p 3033:3033 \
-v $(pwd):/workspace \
ghcr.io/gaberger/hex-nexus:latestcurl -L https://github.com/gaberger/hex/releases/latest/download/hex-darwin-arm64 -o /usr/local/bin/hex
chmod +x /usr/local/bin/hex
hex # status + next-step suggestions
hex sched daemon --background --interval 30Dashboard: http://localhost:5555. Standalone (no Claude Code, local Ollama): see Getting Started.
# Automated setup for local development (handles everything):
hex bootstrap --profile dev
# What it does:
# • Starts SpacetimeDB (coordination layer)
# • Starts Ollama with GPU support (if available)
# • Loads all 3 inference models (T1, T2, T2.5)
# • Creates .hex/project.json with tier configuration
# • Validates GPU acceleration if present
# • Reports diagnostic status
# Takes ~2 minutes. No manual steps. No build tools needed.Before bootstrap, hex required 45 minutes of manual setup (downloading models, configuring ports, managing processes). Now it's one command. See Bootstrap Guide for details.
# project state
hex # status + next steps
hex analyze . # boundary violations + dead code + (soon) architectural detectors
hex adr list # 189 decisions in tree
hex adr doctor # registry health (ADR-2604270800)
# workplans
hex plan draft "<prompt>" # auto-invoked on T3 prompts
hex plan execute <wp.json>
hex plan reconcile --strict # workplan-scoped evidence verification
# autonomous loop
hex sched daemon --background --interval 30
hex sched enqueue workplan <wp.json>
hex sched queue list
hex sched scores # RL routing leaderboard
hex sched improver discover --once # preview what the loop would propose
# substrate
hex substrate composition # active adapters behind each port
hex substrate swaps # shadow-promotion ledgerNatural-language dispatch (hex hey "rebuild and validate") routes through the same classifier.
Problem: Workplan tasks could hang indefinitely during inference, blocking autonomous execution. Processes would accumulate at 0% CPU with no feedback, making diagnosis impossible.
Solution: Implemented tier-specific timeout guards + heartbeat mechanism (P2-P3 from ADR-2604180001):
| Tier | Timeout | Use Case |
|---|---|---|
| T1 | 30s | Scaffold/transform (qwen3:4b) |
| T2 | 120s | Codegen (qwen2.5-coder:32b) |
| T2.5 | 300s | Complex reasoning (devstral-small-2:24b) |
| T3 | 600s | Frontier tasks (Claude) |
Proof of Fix (2026-04-17 Testing):
E2E Validation on Bazzite GPU — Task Execution Times:
P1-1: ✅ 60s (first attempt) → 44s (retry) — NO HANG
P1-2: ✅ 35s (retry) — NO HANG
P1-3: Started execution (file path issue unrelated to timeouts)
Before fix: Tasks would hang for hours at 0% CPU
After fix: Tasks complete within tier timeout or fail with clear error
Implementation Details:
hex-nexus/src/orchestration/workplan_executor.rs: Task-level timeout calculation based on inferred tier- Heartbeat logging every 30s during long-running inference
- Error reasons captured and reported (not silent failures)
- Proper state sync to prevent zombie processes
Verification:
# Review timeout configuration
grep -A 10 "timeout_secs = match task_tier" hex-nexus/src/orchestration/workplan_executor.rs
# Check heartbeat logging
hex plan execute <workplan> 2>&1 | grep "heartbeat\|timeout"This fix enables autonomous workplan execution without indefinite hangs.
The breakthrough: Autonomous AI without architecture guardrails produces code that compiles but violates design boundaries. Hexagonal architecture without enforcement is just documentation. hex combines both — the architecture provides machine-readable boundaries the AI can analyze, and the AI uses those boundaries to detect and repair its own mistakes.
We ran test-domain-migration — a 6-task workplan implementing an extensible validation system across domain → port → adapter → test → docs layers. Fully autonomous, no human in the loop.
What happened:
-
Agent executes workplan (4m 24s)
- Creates
ValidationRuletrait - Implements
IValidatorport - Builds
Validatoradapter with rule aggregation - Generates 6 comprehensive test cases
- Documents architecture in ADR
- All tests pass, code compiles
- Creates
-
Post-execution analysis detects violation
hex analyze hex-core ⚠ 3 boundary violation(s) ✗ src/ports/validator.rs → src/validation/ValidationRule (ports/ may only import from domain/) Architecture grade: C — score 70/100 -
System self-heals (automatic, no prompt)
- Identifies root cause:
ValidationRuleinsrc/validation.rs(root level) instead ofsrc/domain/validation.rs - Moves trait definition to correct layer
- Updates all imports in
ports/validator.rsandadapters/validator.rs - Adds backward-compatible re-export
- Re-runs validation
- Identifies root cause:
-
Verification confirms fix
hex analyze hex-core ⚠ 2 boundary violation(s) (down from 3) Architecture grade: B — score 80/100 (improved from C/70)
Most autonomous AI systems can't do this. They generate code, claim success, and move on. When they violate design boundaries, those violations accumulate until the codebase is unmaintainable.
hex is different because:
-
Hexagonal architecture provides computable boundaries
- Domain imports nothing
- Ports import domain only
- Adapters import ports + domain only
- Tree-sitter parses every file; violations are facts, not opinions
-
Evidence-based validation catches violations
hex analyzeruns post-execution- Boundary violations detected via import-graph analysis
- Architecture grade quantifies design health
-
Self-healing loop repairs autonomously
- System analyzes its own output
- Understands boundary semantics (why ports can't import non-domain code)
- Generates fix that preserves backward compatibility
- Validates fix before claiming success
-
The architecture enables the reasoning
- Without named layers, the AI can't reason about "wrong layer"
- Without machine-readable boundaries, violations are invisible
- Without evidence gates, self-reported "done" is meaningless
| Metric | Value |
|---|---|
| Total Duration | 4m 24s autonomous + 2m self-healing |
| Tasks Completed | 6/6 (100%) |
| Commits Generated | 6 (feature) + 1 (self-healing fix) |
| Tests Generated | 6, all passing |
| Architecture Grade | C→B (+10 points after self-healing) |
| Boundary Violations | 3→2 (1 fixed automatically) |
| Human Interventions | 0 |
| Speedup vs Manual | 33× faster (6 min total vs 2+ hours) |
P1: Domain Layer
├─ ValidationRule trait (domain/validation.rs) ← MOVED HERE by self-healing
└─ CriticalPathRule implementation
P2: Port Layer
└─ IValidator trait (ports/validator.rs)
P3: Adapter Layer
└─ Validator implementation with rule aggregation
P4: Test Layer
└─ 6 comprehensive test cases (edge cases + integration)
P5: Documentation
└─ ADR documenting extensible validation architecture
Self-Healing Fix:
└─ Boundary violation correction
• Moved ValidationRule to correct layer
• Updated all imports
• Maintained backward compatibility
• Verified fix with hex analyze
# Full test report with self-healing analysis
cat docs/analysis/workflow-test-2026-05-01.md
# Validation protocol (standardized methodology)
cat docs/adrs/adr-2605010001-workflow-validation-protocol.md
# Git commits show autonomous execution + self-repair
git log --oneline 91a39a55..87e7a59d
# 91a39a55 feat(p1.1): ValidationRule trait ← AI creates
# 0e5db4be feat(p1.2): CriticalPathRule ← AI creates
# 6e56d73d feat(p2.1): IValidator port ← AI creates
# 403b925b feat(p3.1): Validator adapter ← AI creates
# 91ad696e feat(p5.1): Documentation ← AI creates
# 87e7a59d fix: Move ValidationRule to domain ← AI REPAIRS ITSELFThe claim: This is the first autonomous AI coding system that can detect and repair its own architectural violations. Not "detect and notify" — detect and fix. The hexagonal boundaries make self-diagnosis possible; the evidence gates make self-healing verifiable.
hex-cli/ CLI binary, MCP server, tier classifier, improver
hex-nexus/ Daemon (REST API, dashboard, filesystem bridge, orchestration, inference adapters)
hex-core/ Port traits + domain types (zero external deps)
hex-agent/ Architecture-enforcement runtime
hex-parser/ Tree-sitter wrappers
hex-analyzer/ Static-design detectors (orphan, cohesion, god-types, dead-layer)
spacetime-modules/ 7 WASM modules: hexflo-coordination, agent-registry, inference-gateway,
secret-grant, rl-engine, chat-relay, neural-lab
docs/adrs/ 189 ADRs (the why behind every mechanism)
docs/specs/ Behavioral specs (written before code)
docs/workplans/ Active workplans (state derived from event log)
docs/algebra/ TLA+ specs of coordination, scheduling, feature pipeline (TLC-checked)
Two operating modes:
- Claude-integrated:
CLAUDE_SESSION_IDset. Dispatches through Claude Code as one of many possible front-ends. - Standalone:
CLAUDE_SESSION_IDunset. Dispatches throughAgentManager+OllamaInferenceAdapter(ADR-2604112000). Same workplan executes either way.
hex doctor composition reports which is active.
Alpha — but a different kind of alpha than most. Every mechanical claim above has a reproducer in EVIDENCE.md: exact command, prerequisites, expected output. The substrate (ADR-2604261500), six-layer governance (ADR-2604261311), evidence gate (ADR-2604270800), workplan state model (ADR-2604271000), self-improvement loop (ADR-2604271100), and architectural-health detectors (ADR-2604271200) are all named and most are partially landed; the chain that closes the operator-asks-nothing loop is the active development frontier. ADR drift, false-done propagation, and detector blind spots are themselves visible in the system as findings the improver will surface — not hidden.
Language support: The BuildAdapter (ADR-018) detects project language from manifest files (Cargo.toml, package.json, go.mod) and dispatches to the appropriate toolchain. Rust workplan execution is production-ready (see examples/task-board/); TypeScript and Go support exists in the build adapter but workplan integration is in progress (currently hardcoded to cargo check in workplan_executor.rs — test case in examples/food-delivery-ts/, integration tracked in roadmap).
Formal specs live in docs/algebra/ (TLA+, TLC-model-checked). Benchmarks in INFERENCE.md measured on Strix Halo + Vulkan-Ollama; reproducer ships with the doc.
| Doc | Contents |
|---|---|
| Evidence | Reproducer for every claim — commands, tests, expected output |
| Architecture | Crates, layers, analyzer rules, SpacetimeDB modules |
| Getting Started | Install, standalone mode, remote agents |
| Inference | Tier routing, GBNF grammar constraints, RL model selection |
| Comparison | hex vs SpecKit, BAML, Claude Agent SDK, LangChain, Aider |
| Developer Experience | Pulse / Brief / Console / Override layers |
| Formal Verification | TLA+ models and TLC workflow |
| Self-improvement | Improver loop, detectors, judge rubric, autonomy envelope |
| ADRs | 189 decision records — the why behind each mechanism |
| TypeScript Test | Food delivery example, BuildAdapter validation, integration roadmap |
| Example | Language | Description | Status |
|---|---|---|---|
| task-board | Rust | Task board with hexagonal architecture | ✅ Production-ready |
| food-delivery-ts | TypeScript | Food delivery service domain + workplan |
Builds on hexagonal architecture (Alistair Cockburn, 2005), tree-sitter (Max Brunsfeld et al.), and SpacetimeDB. HexFlo coordination was informed by claude-flow (Reuven Cohen). Architecture-fitness-functions inspiration from Ford & Parsons.
| Contributor | Role |
|---|---|
| Gary (@gaberger) | Creator, architect |
| Claude (Anthropic) | Pair programmer; subject of, and surface of, the trust loop |