devt (short for development team) — a multi-agent development workflow plugin for Claude Code.
A Claude Code plugin that orchestrates a coordinated multi-agent development workflow: implement → test → review → document → learn. Instead of relying on a single monolithic prompt, devt decomposes work across specialized agents — a programmer writes code, a tester verifies it, a code-reviewer audits it, a docs-writer updates documentation, and a retro agent extracts lessons for future sessions. Each agent is focused, stateless, and replaceable.
The plugin is language-agnostic — Python, Go, TypeScript, Vue, or anything else. Project-specific coding standards, testing patterns, quality gates, and architecture rules live in your repository under .devt/rules/, not baked into the plugin.
What you get out of the box:
- Auto-complexity detection — analyzes your task and selects the right pipeline (TRIVIAL through COMPLEX)
- 10 specialized agents — programmer, tester, code-reviewer, docs-writer, architect, retro, curator, verifier, researcher, debugger — plus the opt-in devt-coordinator main-thread router (see Main-thread coordinator)
- Closed learning loop — lessons extracted from each workflow feed back into future sessions
- Permanent memory layer — ADR/CON/FLOW/REJ/LES knowledge graph that survives session boundaries
- Topic Pre-Flight Brief — every workflow surfaces governing decisions, rejected approaches, related lessons, and blast radius before touching code
- Autonomous chaining — implement → test → review → ship without manual
/devt:nextinvocations - Test-driven flag —
--tddreverses implement/test phase order - Architecture health scanning — detect drift across sessions with baseline diffing
- Adversarial council — 5-advisor pressure-test for high-stakes decisions
- Deferred-task tracker — capture mid-work TODOs without derailing current focus
Via Claude Code plugin marketplace (recommended):
/plugin marketplace add emrecdr/devt
/plugin install devt
The marketplace lives at the repo root (.claude-plugin/marketplace.json). Updates via /plugin update devt. All commands become available as /devt:command-name.
Via git clone (development / pre-marketplace):
git clone https://github.com/emrecdr/devt.git ~/.devt
claude --plugin-dir ~/.devtTo avoid --plugin-dir every time:
echo 'alias devt="claude --plugin-dir ~/.devt"' >> ~/.zshrc # or ~/.bashrcPlugin agents (
devt:programmer,devt:retro, etc.) only register when devt is loaded via the marketplace install or--plugin-dir. Runningclaudefrom a project directory without either path discovers commands and skills via cwd auto-discovery, but agents won't appear inclaude agents.
devt itself is zero-npm-dependency — Node 22+, bash, and Claude Code are the only required tools. Three optional integrations plug into specific pipelines and add measurable benefits. Each is auto-detected on PATH; if absent, devt falls back to grep / scratchpad markers.
Repo: github.com/safishamsi/graphify
uv tool install graphifyy[mcp] # recommended — works for both CLI and MCP server
# alternatives:
# pipx install graphifyy[mcp]
# pip install graphifyy[mcp] # CLI works; MCP server still requires uv on PATHTree-sitter multi-language parser that binds memory docs to actual functions/classes. Used by 6 of 10 dev agents (programmer, debugger, researcher, code-reviewer, verifier, architect), Pre-Flight Brief Lane C, blast-radius queries, and memory validate stale-symbol detection. See Features in detail → Graphify for the surface-by-surface benefit comparison.
curl -LsSf https://astral.sh/uv/install.sh | shRequired for Graphify v0.7.10+ — its MCP server launches via uv run --with graphifyy --with mcp -m graphify.serve. Without uv, devt scaffolds a python3 -m graphify.serve fallback only when graphifyy is importable from your system Python.
Repo: github.com/thedotmack/claude-mem — see their docs for install.
Captures ⚖️ (decisions) and 🔵 (insights) tags during sessions. devt's discovery harvester (bin/modules/discovery.cjs::harvest()) mines these into curator-reviewable proposals at .devt/memory/_suggestions.md. The curator agent gates each candidate via AskUserQuestion before promoting to permanent memory. Without claude-mem, discovery still works on #KNOWLEDGE-CANDIDATE scratchpad markers and DEC-xxx entries — you get fewer auto-surfaced ADR/CON candidates but nothing breaks.
/devt:initScaffolds .devt/rules/ with project-specific conventions and creates .devt/config.json. devt auto-detects your stack and selects the matching template (python-fastapi, go, typescript-node, vue-bootstrap, or blank). The wizard pitches optional integrations and confirms detected primary branch. Declining still produces a fully working install.
Your first workflow:
/devt:workflow "add a health check endpoint at GET /health returning 200"/devt:workflow is the primary entry. devt analyzes the task, picks a complexity tier (TRIVIAL / SIMPLE / STANDARD / COMPLEX), and runs the matching pipeline. If you don't know which command to use, /devt:do "describe what you want" routes for you.
Task format: imperative verb + specific outcome.
- ✓
"add health check endpoint at GET /health returning 200 with status ok" - ✓
"fix login validation that accepts empty passwords" - ✗
"make it better"— too vague - ✗
"refactor everything"— too broad
.devt/config.json (project root) configures plugin behavior. Only set what you want to override — defaults handle the rest. Five keys cover most needs:
{
"model_profile": "quality",
"memory": { "preflight_mode": "block" },
"graphify": { "enabled": true },
"scope_mode": "surgical",
"git": { "primary_branch": "main", "contributors": ["alice", "bob"] }
}| Key | What it controls | Default |
|---|---|---|
model_profile |
Per-agent model tier — quality / balanced / budget / inherit |
quality |
memory.preflight_mode |
Pre-flight guard strictness — off / warn / block |
block |
graphify.enabled |
Enable AST-anchored code search | false (auto-set to true if graphify is on PATH at first setup) |
scope_mode |
How agents handle unrelated findings — surgical (ask first) / boyscout (small mechanical fixes ok) |
surgical |
rubrics.<workflow_type> |
Pinned verifier rubric filename per workflow_type. Two workflows dispatch the verifier today: dev (default dev.v1.md) and code_review (default code_review.v1.md). Bump to a newer version after testing — devt ships old rubrics alongside new ones, so projects can pin or roll back independently. |
{dev: "dev.v1.md", code_review: "code_review.v1.md"} |
git.primary_branch / git.contributors |
Used by /devt:ship and reports |
auto-detected |
For the full schema (model_overrides, agent_skills, multi-root memory, arch_scanner, workflow toggles), see Configuration reference.
The 80% of devt usage centers on a small set of commands. Each handles a distinct intent:
/devt:workflow "add OAuth login flow" # auto-tier, full pipeline
/devt:workflow --autonomous "<task>" # implement → test → review → ship without prompts
/devt:workflow --tdd "<task>" # test-first: write tests, watch them fail, then implement
/devt:workflow --dry-run "<task>" # preview the pipeline without executingdevt picks a tier based on task analysis. You never need to choose. Override only if needed.
/devt:do "fix the failing auth tests"Routes the freeform description to the right command (workflow, debug, review, plan, etc.). Useful when you'd otherwise stare at the command list.
If you'd rather not type /devt:do on every prompt, devt ships an opt-in main-thread coordinator that runs in front of your session and does the same routing automatically — but only when the prompt is devt-shaped. Casual questions and conversation pass through to a normal Claude session.
Opt in by adding one line to your project's .claude/settings.json:
{
"agent": "devt-coordinator"
}Or invoke ad-hoc for a single session:
claude --agent devt-coordinatorAfter opting in, every prompt is classified:
- Devt-shaped task (e.g. "fix the 405 on POST /admin", "review my changes", "ship this") → routed to the matching
/devt:*command via Skill tool. Same routing table as/devt:do. - Casual / general (e.g. "explain quicksort", "thanks", "what's a closure?") → answered directly. No routing nag.
- Ambiguous → asked once, with an "answer directly" bail-out option.
The agent body lives at agents/devt-coordinator.md. Read it before opting in if you want to see the exact classification protocol.
Caveat (Claude Code plugin agent security restriction): plugin agents cannot define their own hooks, mcpServers, or permissionMode frontmatter. If you need any of those per-coordinator, copy agents/devt-coordinator.md into your project's .claude/agents/ and use that copy — the personal copy is unrestricted. Devt's plugin-level hooks and the devt-memory MCP server still fire normally either way.
/devt:debug "login form silently fails on Safari mobile"Four-phase investigation (Symptom → Hypothesis → Test → Fix) in an isolated context, so root-cause work doesn't pollute your main session. Persists state across context resets.
/devt:nextReads .devt/state/, detects what just happened (workflow paused, review found issues, deferred queue has items, etc.), and runs the appropriate next step. The "I forgot what I was doing" command.
/devt:shipRuns after a workflow completes. Reads impl-summary.md, test-summary.md, review.md, and creates a PR with auto-generated title + body. Handles uncommitted changes, branch detection, and CI status.
Build something end-to-end (most common):
/devt:workflow "add password reset endpoint with email verification"
# devt does: scan → implement → test → review → verify → docs → retro → autoskill
/devt:shipTDD a small change:
/devt:workflow --tdd "add validation rejecting passwords shorter than 12 chars"
# tests get written first and watched to fail before implementation runsFix a bug:
/devt:debug "users report 500 on /api/profile after upload"
# isolated investigation; produces debug-summary.md with root cause + fixPause mid-work, resume next session:
/devt:pause
# next session in same project:
/devt:next # reads handoff.json, resumesCapture a TODO without derailing:
/devt:defer "rate-limit /api/login — Redis backend, see SEC-007"
# survives /devt:cancel-workflow; surfaces in /devt:next when idleHigh-stakes architectural decision:
/devt:council "should we move from REST to GraphQL for the public API?"
# 5 advisors in parallel + peer review + chairman synthesis → verdict + next stepFor detailed walkthroughs, see Workflows & use cases in detail.
Standard AI coding has three concrete failure modes that compound over time.
A monolithic prompt forgets every architectural decision the moment the context window rolls over. You end up re-explaining "we use Argon2id for hashing, never bcrypt" in every session, and the AI silently re-proposes rejected approaches.
devt fixes this with a permanent memory layer at .devt/memory/ — markdown docs with strict frontmatter, FTS5-indexed, queried at every workflow start. REJ tombstones suppress re-proposals across all agents.
A single prompt either over-engineers a one-line fix or under-thinks a refactor. There's no orchestration that matches effort to complexity.
devt fixes this with auto-tier selection. TRIVIAL tasks run inline; STANDARD tasks add scan/test/review; COMPLEX tasks add research, plan, architecture review, verification, and curated lesson capture. You never pick a tier — devt detects it.
Even within a single project, lessons are lost the moment a session ends. The team's hard-won "the integration tests fail when fixture seed order changes" insight gets re-discovered three weeks later.
devt fixes this with a closed learning loop: retro extracts → curator gates approval → LES-NNNN docs land in .devt/memory/lessons/ → Pre-Flight Brief surfaces them at the next workflow start. Knowledge accumulates instead of evaporating.
User → Command (thin) → Workflow (orchestration) → Agent (worker)
↓
.devt/state/ (artifacts)
.devt/memory/ (permanent knowledge)
The execution model follows a Command → Workflow → Agent architecture:
- Commands (32 files): thin entry points. Parse arguments, delegate to a workflow. No business logic.
- Workflows (31 files): orchestration. Determine tier, coordinate agents, manage state transitions.
- Agents (11 files): 10 focused workers (programmer/tester/code-reviewer/docs-writer/architect/retro/curator/verifier/researcher/debugger) plus the opt-in
devt-coordinatormain-thread router. - Skills (16 directories): technique libraries injected into agents (codebase scanning, complexity assessment, TDD patterns, verification patterns, memory curation, Graphify helpers, …).
- Hooks (7 lifecycle events): SessionStart, Stop, SubagentStart, SubagentStop, PostToolUse, PreToolUse, UserPromptSubmit. Profile-controlled (
DEVT_HOOK_PROFILE=minimal|standard|full).
/devt:workflow auto-selects a tier based on task complexity:
| Tier | Pipeline | Auto-detected when |
|---|---|---|
| TRIVIAL | execute inline → validate gates | ≤3 files, no decisions needed |
| SIMPLE | implement → test → review | Single file, known pattern |
| STANDARD | scan → implement → test → review → verify → docs → retro → autoskill | Multiple files, existing patterns |
| COMPLEX | auto-research → auto-plan → scan → architect → implement → test → review → verify → docs → retro → curate → autoskill | New patterns, architectural decisions |
You never need to pick a tier. Override the auto-detection if needed.
Workflow-routing artifacts come in pairs: a human-readable .md (narrative) and a machine-readable .json sidecar (authoritative for status routing). Three artifacts use this pattern today: impl-summary, test-summary, and verification. Before dispatching the LLM verifier, the workflow runs a zero-dep deterministic grader (bin/modules/grader.cjs) against the test-summary and impl-summary sidecars. The grader walks the ## Deterministic Gates JSON block in references/rubrics/dev.v1.md and returns one of three envelope shapes — ok:false (I/O failure → BLOCKED), ok:true, pass:false (constraint violation → RETRY or PRUNE under workflow.max_iterations), ok:true, pass:true (greens → LLM verifier dispatches). The LLM verifier is skipped entirely on red-test cycles, saving ~5–15K input tokens per failed iteration. Projects can ship lenient rubrics at .devt/rubrics/<file>.md to override gate strictness per workflow_type.
Skills inject into agents at dispatch time based on skill-index.yaml (or .devt/config.json overrides):
| Agent | Default Skills |
|---|---|
| programmer | codebase-scan, scratchpad, api-docs-fetcher, strategic-analysis, tdd-patterns, verification-patterns |
| tester | scratchpad, tdd-patterns |
| code-reviewer | code-review-guide, codebase-scan, scratchpad |
| docs-writer | scratchpad |
| architect | codebase-scan, architecture-health-scanner, api-docs-fetcher, strategic-analysis, complexity-assessment |
| verifier | codebase-scan, verification-patterns |
| researcher | codebase-scan, strategic-analysis |
| debugger | codebase-scan |
| retro | lesson-extraction, autoskill |
| curator | memory-curation, autoskill |
Every devt-configured project gets a .devt/rules/ directory containing project-specific rules that agents read at execution time. This keeps the plugin generic while giving agents deep project knowledge.
Required files:
| File | Purpose |
|---|---|
coding-standards.md |
Language conventions, naming, formatting, import rules |
testing-patterns.md |
Test framework, patterns, coverage expectations |
quality-gates.md |
Lint, typecheck, test commands and pass criteria |
architecture.md |
Layer structure, dependency rules, module boundaries |
Optional: review-checklist.md, api-changelog.md, documentation.md, git-workflow.md, golden-rules.md, patterns/common-smells.md.
Run /devt:init to generate these from a template matched to your stack.
Full schema for .devt/config.json (project root). Global ~/.devt/defaults.json sets user-wide defaults that project config overrides. Merge order: hardcoded defaults → ~/.devt/defaults.json → .devt/config.json (later overrides earlier).
{
"model_profile": "quality",
"model_overrides": { "tester": "opus" },
"git": {
"provider": "github", "workspace": "my-team", "slug": "my-repo",
"primary_branch": "main", "contributors": ["alice", "bob"]
},
"agent_skills": { "programmer": ["codebase-scan", "scratchpad", "api-docs-fetcher"] },
"memory": {
"paths": ["../engineering-adrs", ".devt/memory"],
"preflight_mode": "block",
"enabled": true,
"auto_index_on_change": true
},
"graphify": { "enabled": true, "command": "graphify" },
"arch_scanner": { "command": "make arch-scan", "report_dir": "docs/reports" },
"scope_mode": "surgical",
"workflow": {
"docs": true, "retro": true, "verification": true,
"autoskill": true, "regression_baseline": true
}
}| Key | Values | Default |
|---|---|---|
model_profile |
quality / balanced / budget / inherit |
quality |
model_overrides |
Per-agent model tier (opus / sonnet / haiku / inherit) | from model_profile |
git.provider |
github / gitlab / bitbucket |
auto-detect from remote |
git.workspace / git.slug |
Repo identifiers (used by /devt:ship) |
auto-detect |
git.primary_branch |
Integration branch | 4-step fallback chain (origin/HEAD → init.defaultBranch → common-name heuristic → current branch) |
git.contributors |
Display names for /devt:weekly-report |
git log scan |
agent_skills |
Per-agent skill list overrides | see skill-index.yaml |
memory.paths |
Multi-root memory roots — last-wins precedence | project-local only |
memory.preflight_mode |
Pre-flight guard hook strictness — off / warn / block |
block |
memory.enabled |
Master switch — disables Pre-Flight Brief, discovery harvester, auto-index hook, pre-flight guard hook | true |
memory.auto_index_on_change |
PostToolUse hook rebuilds FTS5 index when memory docs touched | true |
graphify.enabled |
Boolean | false (auto-set to true by setup.cjs when graphify is on PATH at first setup) |
graphify.command |
Binary name | graphify |
arch_scanner.command |
Architecture scanner invocation | null (manual analysis) |
arch_scanner.report_dir |
Where scan output lands | docs/reports |
scope_mode |
surgical / boyscout — see below |
surgical |
workflow.docs / .retro / .verification / .autoskill / .regression_baseline |
Toggle pipeline steps | all true |
Controls how agents handle unrelated findings discovered while doing the requested task — dead imports, lint warnings, cosmetic issues in files they're touching anyway.
| Mode | Behavior | When to pick |
|---|---|---|
surgical (default) |
Find-Surface-Decide protocol per golden-rules.md Rule 5: agent finds the unrelated issue, surfaces it (in impl-summary or defer add), and does NOT fix it without explicit approval. Keeps PRs reviewable. |
Production codebases, regulated environments, anywhere PR diff hygiene matters, code-review handoffs |
boyscout |
Blanket authority for small mechanical in-file cleanups (dead imports, formatter fixes, removing console.log) without asking — only in files the agent is already touching, and only for behavior-preserving changes. Bigger findings still go through Find-Surface-Decide. |
Personal projects, prototypes, fast-moving codebases, individual contributors who own the diff |
The setting is declarative — no enforcement code reads it. Agents self-regulate based on the rule body and the resolved value in their context.
DEVT_HOOK_PROFILE=minimal|standard|full (env var, default standard) controls which hooks fire:
| Hook | minimal | standard | full |
|---|---|---|---|
session-start.sh |
✓ | ✓ | ✓ |
stop.sh |
✓ | ✓ | ✓ |
workflow-context-injector.sh |
– | ✓ | ✓ |
subagent-status.sh |
– | ✓ | ✓ |
read-before-edit-guard.sh |
– | ✓ | ✓ |
context-monitor.sh |
– | – | ✓ |
prompt-guard.sh |
– | – | ✓ |
Disable specific hooks: DEVT_DISABLED_HOOKS=hook1.sh,hook2.sh.
A self-evolving knowledge graph that joins:
- The code that exists — what functions, classes, modules actually live in the repo (Graphify AST). When the graph is built,
graphify-out/GRAPH_REPORT.mdgod-nodes also seed concept (CON-*) candidates and feed the Pre-Flight Brief's Cross-Cutting Concerns section so structural couplings surface before any change starts. - The conversation happening now — ephemeral observations captured mid-session (claude-mem ⚖️ decisions / 🔵 discoveries)
- The permanent rules of the project — what we always do and what we said no to (Markdown + SQLite FTS5)
The layer is ground truth: every dev workflow consults it before touching code, and curator-gated promotion ensures only validated knowledge lands.
.devt/state/ LAYER 1 — ephemeral (per-workflow)
├── decisions.md DEC-xxx — clarify/specify/research scratch
├── lessons.yaml retro draft hand-off → curator promotes to LES-NNNN
├── deferred.md DEF-NNN cross-workflow TODO queue (reset-exempted)
├── preflight-brief.md Topic Pre-Flight Brief (auto-fired)
├── scratchpad.md cross-agent handoff (#KNOWLEDGE-CANDIDATE)
└── … reset on /devt:cancel-workflow
.devt/memory/ LAYER 2 — permanent (canonical knowledge)
├── decisions/ ADR-xxx — constitutional decisions
├── concepts/ CON-xxx — durable mental models
├── flows/ FLOW-xxx — named sequences (auth, deploy, …)
├── rejected/ REJ-xxx — tombstones (we said no, here's why)
├── lessons/ LES-xxx — operational lessons ("when X, do Y")
├── _suggestions.md discovery proposals (curator-only writes)
└── index.db FTS5 unified index (gitignored, regenerable)
Each doc is markdown with strict YAML frontmatter — id, doc_type, status, confidence, title, summary, affects_paths, affects_symbols, links, created_at. ID prefixes enforced: ADR-001, CON-042, FLOW-007, REJ-013, LES-001.
| Type | Use for | Example |
|---|---|---|
| ADR (decision) | Constitutional rules — "we always do X, never Y" | "Auth uses HMAC-SHA256, never plain JWT" |
| CON (concept) | Durable mental models — "this is what X means here" | "A 'session' here is a request chain bound by trace_id" |
| FLOW (sequence) | Named multi-step processes | "Production deploy: PR→smoke→canary→staged rollout→pagerduty hold" |
| REJ (rejected) | Tombstones — "we considered X, here's why it's a no" | "Server-Sent Events: rejected (cors_workarounds, mobile_battery_drain)" |
| LES (lesson) | Operational tactics — "when X happens, do Y" | "When integration tests flake on first run, check fixture seed order" |
Confidence: verified > explicit > inferred > observed > speculative. Status: candidate → active → superseded → rejected.
node bin/devt-tools.cjs memory init # scaffold .devt/memory/{decisions,concepts,flows,rejected,lessons}/
node bin/devt-tools.cjs memory index # rebuild FTS5 index from markdown
node bin/devt-tools.cjs memory query <terms> [--doc-type=…] # full-text search
node bin/devt-tools.cjs memory get <id> # fetch by id (e.g. ADR-007)
node bin/devt-tools.cjs memory list [--doc-type=… --status=…] # filtered listing
node bin/devt-tools.cjs memory active [--domain=…] # active docs only
node bin/devt-tools.cjs memory affects <glob> # docs governing path
node bin/devt-tools.cjs memory affects-symbol <symbol> # docs governing symbol
node bin/devt-tools.cjs memory links <id> [--depth=N] # transitive link traversal
node bin/devt-tools.cjs memory orphans # docs with no inbound links
node bin/devt-tools.cjs memory stale-links # broken wiki-link targets
node bin/devt-tools.cjs memory rejected-keywords # all REJ search_keywords (used for AI suppression)
node bin/devt-tools.cjs memory validate # frontmatter + path + symbol checks
node bin/devt-tools.cjs memory paths [--validate] # multi-root path config inspection
node bin/devt-tools.cjs memory diff <root-a> <root-b> # cross-root diff
node bin/devt-tools.cjs memory bundle export --out=… --filter=… # portable JSON export
node bin/devt-tools.cjs memory bundle import <file> [--prefix=…] # import with optional ID remap-
Tier 1 — Topic Brief (automatic): every dev workflow auto-fires
/devt:preflight "<task>"at context_init. The 6-lane orchestrator (bin/modules/preflight.cjs) writes.devt/state/preflight-brief.md:- Lane A —
affects_pathsglob match - Lane B — FTS5 keyword expansion
- Lane C —
affects_symbolsAST match (Graphify-anchored when enabled) - Lane D — wiki-link transitive closure (depth 2) from A∪B∪C seeds
- Lane E — REJ tombstone overlap on
search_keywords - Lane F — filters governing docs for
doc_type='lesson'to render LES-NNNN entries
All 8 dev agents preload
devt:memory-pre-flightand read the Brief first. - Lane A —
-
Tier 2 — File guard (PreToolUse): agents append
PREFLIGHT <ts> edit <path> :: <governing IDs>to scratchpad before each Edit/Write.hooks/pre-flight-guard.shchecks the line.memory.preflight_mode:off/warn/block(default block).
The PostToolUse hooks/memory-auto-index.sh rebuilds the FTS5 index whenever .devt/memory/**.md is touched (debounced; collapses curator batch-promotions into a single rebuild).
bin/devt-memory-mcp.cjs ships with the plugin and is registered via the plugin-root .mcp.json — Claude Code resolves ${CLAUDE_PLUGIN_ROOT} at MCP-server launch and starts the server automatically whenever the devt plugin is loaded (no per-project scaffolding). JSON-RPC 2.0 stdio, zero external dependencies, three layers of defense (OPEN_READONLY + SELECT-only validator + multi-statement guard) on the query_index SQL escape hatch. Tools: get_context_for_path, get_context_for_symbol, query_fts, get_doc, list_active, list_rejected_keywords, list_links, preflight, blast_radius, query_index.
Per-call telemetry lands in .devt/memory/_mcp-trace.jsonl (privacy-safe — sizes + 12-char fingerprints, no raw args). Aggregate via node bin/devt-tools.cjs mcp-stats.
Set memory.paths in .devt/config.json to index company-wide ADRs alongside project-local ones:
{ "memory": { "paths": ["../engineering-adrs", ".devt/memory"] } }Last-wins precedence: project-local overrides shared on ID collision. source_root column tracks provenance. Conflicts are explicit (memory index returns a conflicts[] array) — never silent.
node bin/devt-tools.cjs memory bundle export --out=acme-memory.json --filter=domain:auth
node bin/devt-tools.cjs memory bundle import acme-memory.json --prefix=ACME-Round-trip-safe portable JSON with optional ID prefix remapping for cross-org sharing.
/devt:workflow accepts flags that modify pipeline shape:
| Flag | What it does |
|---|---|
--autonomous |
Skip the human-in-the-loop checkpoints between phases. After implement → auto-runs test → review → ship if review passes. State key autonomous_chain persists the choice across /devt:next resumes. Stop manually with /devt:cancel-workflow. |
--tdd |
Reverses implement/test phase order. Tester writes failing tests first against the spec, programmer then implements until tests pass. Best for tasks where the test contract is clearer than the implementation path. |
--dry-run |
Preview the tier + pipeline + agents that would run, without dispatching any. Useful for understanding what /devt:workflow will do on a fragile task before committing. |
devt captures and reuses knowledge across sessions:
- Extract — retro agent distills lessons (4-filter quality gate) →
.devt/state/lessons.yaml - Curate — curator agent applies the 5-filter (Specificity, Durability, Non-obviousness, Evidence, Actionability) and presents AskUserQuestion per candidate → on approval, writes
.devt/memory/lessons/LES-NNNN-slug.md - Index —
memory indexrebuilds the unified FTS5 database (auto-triggered by PostToolUse hook on memory-doc changes) - Query — Pre-Flight Brief queries
index.dbacross all 5 doc types at workflow start - Inject — Brief's "Related Operational Lessons" section is lifted into
<learning_context>for programmer/tester/code-reviewer dispatches
The loop is fully closed — lessons flow from completed work back into future agents.
/devt:arch-health runs the project's architecture scanner (configured via arch_scanner.command) and detects structural drift across sessions:
- Baseline mode — first run captures the current state to
.devt/state/arch-baseline.json - Delta mode — subsequent runs compare against baseline, surfacing only NEW violations (no noise from pre-existing debt)
- Triage mode — interactive review of findings via AskUserQuestion: fix now, defer (
/devt:defer), or accept-as-baseline
The python-fastapi reference template ships an arch-scan.py that detects 6 layer-violation patterns (LAYER-IMPORT-DOMAIN, LAYER-IMPORT-API, DB-IN-APPLICATION, INLINE-IMPORT, GOD-FILE, …). Other templates can wire any scanner — output must be JSON with a findings array.
/devt:quality runs lint, typecheck, and tests as defined in .devt/rules/quality-gates.md. The rules file specifies the exact commands and pass criteria for your stack — devt has no opinion. Agents read this file before reporting "tests passing" so the claim is grounded in your project's actual gates, not assumptions.
.devt/state/deferred.md with DEF-NNN ids holds cross-workflow TODOs ("things we said we'd do later"). Captured via /devt:defer "<title>" from any workflow. Exempted from state reset so items survive /devt:cancel-workflow. Surfaces in /devt:status (count) and /devt:next (idle pickup via AskUserQuestion). Distinct from the memory layer — deferred items are transient TODOs, not curator-gated, not in Pre-Flight Brief noise.
/devt:thread creates persistent investigation contexts that survive session boundaries. Useful for multi-day debugging or research where the trail can't fit in one session. Subcommands: create, list, resume, update. Each thread has its own scratch + decision log; reading a thread restores the full context cheaply.
/devt:note "<thought>" saves a freeform note without derailing your current workflow. Notes can later be promoted to deferred items, memory candidates, or just deleted. The "I'll forget this if I keep coding" mechanism.
/devt:forensics analyzes a stuck or failed workflow's artifacts (.devt/state/, git history, recent commits) and diagnoses what went wrong. Useful when /devt:next hits a wall and you can't figure out why.
/devt:autoskill runs after retro and analyzes the session for patterns: skills that should have been preloaded but weren't, commands that took too many tries, friction points. Proposes additions to .devt/state/autoskill-proposals.md. Curator decides what to merge into skill-index.yaml. Meta-feature most users won't touch directly.
/devt:weekly-report generates a markdown summary of git activity for the configured git.contributors. Runs against any time window. /devt:session-report generates a session summary (work done, commits, decisions, outcomes) without git dependency.
references/questioning-guide.md defines how /devt:clarify and /devt:specify interview users. Key principles:
- Before You Ask — codebase-first: grep/Read/
memory querybefore any question; only ask about decisions requiring user judgment - Walk the Decision Tree — resolve roots before dependents, cut subtrees on root answers
- One at a Time — AskUserQuestion supports up to 4 questions per call but discipline says use 1; each answer reframes the next
- Recommendation Required — every option carries validated reasoning; mark recommended option
(Recommended)and place first
/devt:council "<question>" convenes 5 advisors in parallel, each with a distinct thinking style designed to create three natural tensions: Contrarian ⇄ Generalizer (downside vs upside), First Principles ⇄ Pragmatist (rethink vs ship), with the Newcomer keeping everyone honest by reading the question fresh and asking obvious questions.
| Advisor | Lens | Asks |
|---|---|---|
| Contrarian | What's the worst case? | "What breaks under load? What's the on-call cost when this fails? Have we hit this class of bug before?" |
| First Principles Thinker | What does the problem actually require? | "Strip away the current solution — what are we really trying to do? Is there a simpler primitive?" |
| Generalizer | What latent value or pattern fits? | "Have other teams solved this? Can this become a reusable pattern? What does the broader literature say?" |
| Newcomer | What's obvious that everyone missed? | "Why does this even need to exist? What would a junior dev assume? What's the simplest thing that could work?" |
| Pragmatist | What's the smallest concrete next step? | "Even if the plan is brilliant, what do we actually do tomorrow morning? What's the 30-min experiment that de-risks this?" |
After advisors respond in parallel, responses are anonymized and peer-reviewed (no advisor knows who said what). A Chairman then synthesizes the round into a verdict: consensus, conflicts, blind spots, a recommendation, and one concrete next step. The full transcript saves to .devt/state/council-{slug}-{timestamp}.md for later reference.
When the council fires:
- Manually — invoke
/devt:council "should we use Postgres or Mongo for this workload?"whenever you suspect your first instinct is biased. - Automatically (off-ramp) —
references/council-offramp.mddefines the escalation criteria./devt:clarifyand/devt:specifyroute to council when an open question is high-stakes (architecture-shaping, expensive-to-reverse, or has 3+ defensible options with no clear winner). The off-ramp sequence is: clarify → if council-worthy → council → resume clarify with the council verdict as decision input. --mixed-modelsflag — dispatches advisors across opus/sonnet/haiku for higher reasoning diversity at extra token cost. Default is single-model dispatch.
The council deliberately does NOT fire for trivial questions (factual lookups, single-line fixes, syntax). The skill description's trigger boundary keeps it from being a hammer for every nail.
For users who installed Graphify, here's the surface-by-surface benefit comparison vs grep fallback:
| Surface | Without Graphify | With Graphify |
|---|---|---|
| Pre-Flight Brief Lane C (symbol resolution) | grep across affects_symbols text fields |
tree-sitter AST resolution — knows User the class differs from User the type alias |
blast_radius() MCP tool |
Falls back to filename matching | Walks actual import graph via getNeighbors() for true impact analysis |
memory validate stale-symbol check |
Cannot detect — keeps affects_symbols: [renamedFn] after refactor silently |
Flags symbols in memory docs that no longer exist in the codebase |
| Code-search agents (programmer/debugger/researcher/code-reviewer/verifier/architect) | grep + path patterns | Graphify-first protocol → ~200–400 tokens per query vs ~3–5K with grep |
Concrete savings vary by codebase size; the ~10× claim is conservative for medium codebases (500+ files).
devt uses Claude Code hooks for lifecycle events (see Configuration reference → Hook profile for the matrix).
Guardrails (guardrails/): contamination guidelines, generative-debt checklist, golden rules (15 numbered rules including Pre-Flight Protocol and scope_mode protocol), incident runbook, skill-update guidelines.
/devt:workflow "add password reset endpoint with email verification, rate-limited at 3/hour per email"What runs (STANDARD or COMPLEX tier auto-detected):
- Pre-Flight Brief — surfaces governing ADR/CON/FLOW for auth + email + rate-limiting domains; flags REJ tombstones (e.g., "we said no to bcrypt"); injects related lessons
- Scan (architect or scan agent, COMPLEX only) — maps the affected layers
- Implement (programmer) — writes code following
.devt/rules/coding-standards.md - Test (tester) — writes tests per
.devt/rules/testing-patterns.md; runs.devt/rules/quality-gates.mdcommands - Review (code-reviewer) — read-only audit per
.devt/rules/review-checklist.md - Verify (verifier) — checks the implementation actually meets the original task description
- Docs (docs-writer) — updates README/CHANGELOG/API docs as needed
- Retro — distills lessons →
lessons.yaml(curator promotes to LES-NNNN later) - Autoskill — analyzes the session for skill-index improvements
/devt:ship # creates PR with auto-generated body from impl-summary + test-summary + review/devt:workflow --tdd "add validation rejecting passwords shorter than 12 chars or with no special character"The --tdd flag swaps phase 3 and 4. Tester writes failing tests first, runs them to confirm they fail, then programmer implements until tests pass. Best when the contract is clear (validation rules, parsing logic, pure functions).
/devt:workflow --autonomous "rename internal helper getUser → fetchUserById across the codebase"After implement → auto-runs test → review → ship. If review returns NEEDS_WORK, the chain pauses for human input. Stop manually with /devt:cancel-workflow. Best for mechanical changes where you trust the agents.
/devt:debug "users report 500 on /api/profile after avatar upload — only Safari mobile, only on second upload"Four-phase investigation in an isolated context (the debugger agent has its own conversation lane so root-cause exploration doesn't pollute your main session):
- Symptom — formalize the failure mode, reproduce if possible
- Hypothesis — generate 2–4 candidate causes with falsifiability
- Test — design minimal experiments to distinguish hypotheses
- Fix — apply the fix, verify the symptom is gone, update relevant memory
Persists state across context resets via memory: project agent persistence at .claude/agent-memory/devt-debugger/.
/devt:pause # captures: current phase, decisions so far, next action — to handoff.json + continue-here.mdThen in a future session in the same project:
/devt:next # reads handoff.json, resumes the workflow, deletes the handoffUseful at end of day or when a workflow blocks on an external decision (waiting for stakeholder, blocked by another team).
/devt:defer "rate-limit /api/login — Redis backend, see SEC-007"
/devt:defer list
/devt:defer close DEF-003/devt:next surfaces an idle deferred queue via AskUserQuestion when no other work is resumable: "5 deferred items waiting. Pick one to start?" Items survive /devt:cancel-workflow (the only state-reset exemption).
/devt:council "should we move from REST to GraphQL for the public API? Current REST has ~40 endpoints, mostly CRUD with 5 complex aggregation queries. We have 3 client teams (web, iOS, Android) and concerns about mobile bandwidth."5 advisors respond in parallel (Contrarian, First Principles, Generalizer, Newcomer, Pragmatist), peer-review each other anonymously, and the Chairman synthesizes a verdict. Full transcript saves to .devt/state/council-rest-vs-graphql-{timestamp}.md for later reference. Add --mixed-models for opus/sonnet/haiku diversity at extra token cost.
/devt:arch-health # first run: captures baseline
# … weeks pass, code evolves …
/devt:arch-health # subsequent run: shows DELTA only (new violations)
/devt:arch-health --triage # interactive: fix / defer / accept-as-baseline per findingThe baseline lives in .devt/state/arch-baseline.json so the team can ratchet quality forward without drowning in pre-existing debt noise.
/devt:uninstallInteractive workflow that asks which level of reset you want and confirms before any destructive op. Always creates a .devt.bak.YYYYMMDD-HHMMSS/ backup for project-reset and full-reset modes.
| Mode | What it does | Keeps |
|---|---|---|
| Reinit | Re-scaffold .devt/rules/ + .devt/config.json from template |
Memory, lessons, deferred queue |
| Project reset | Wipe all .devt/ in this project |
Files outside .devt/ |
| Full reset | Wipe .devt/ + scattered devt files at repo root |
Optional Graphify / claude-mem caches |
| Plugin uninstall | Remove the plugin itself (advisory — auto-detects install type and instructs) | All project .devt/ directories |
devt scatters a few files outside .devt/ (.mcp.json, .claude/agent-memory/devt-debugger/, .gitignore entries, .git/hooks/post-commit) because Claude Code and git specs require those paths. The full-reset mode handles all of them.
Primary (start here):
| Command | Description |
|---|---|
/devt:do |
Don't know which command? Describe what you want — devt routes to the right one |
/devt:workflow |
Build, fix, or improve anything. Supports --autonomous, --tdd, --dry-run |
/devt:specify |
Define a feature through interview and codebase analysis — produces a validated PRD |
/devt:debug |
Investigate and fix a bug with 4-phase systematic debugging |
/devt:ship |
Create PR with auto-generated description from workflow artifacts |
/devt:next |
Auto-detect where you are and run the next logical step |
Setup & help: /devt:init, /devt:uninstall, /devt:help.
Utilities: /devt:status, /devt:pause, /devt:forensics, /devt:cancel-workflow, /devt:note, /devt:defer, /devt:health, /devt:session-report, /devt:update, /devt:thread, /devt:weekly-report, /devt:council.
Internal (called by workflows, available to power users): /devt:plan, /devt:research, /devt:clarify, /devt:implement, /devt:fast, /devt:review, /devt:quality, /devt:retro, /devt:arch-health, /devt:autoskill, /devt:memory, /devt:preflight.
bin/devt-tools.cjs is a zero-dependency Node.js CLI for state management and diagnostics:
# State + config + setup + models
node bin/devt-tools.cjs state read|update|reset|validate|sync|prune
node bin/devt-tools.cjs config get|set
node bin/devt-tools.cjs models get|resolve|list|table <profile>
node bin/devt-tools.cjs setup --template <name> [--mode create|update|reinit]
# Memory layer (full subcommand reference in Features → memory CLI)
node bin/devt-tools.cjs memory <subcommand>
# Pre-Flight Brief
node bin/devt-tools.cjs preflight "<topic>"
# Deferred TODO tracker
node bin/devt-tools.cjs deferred add "<title>" [--context=… --tags=a,b --by=<agent>]
node bin/devt-tools.cjs deferred list|get|close|reopen|count
# Diagnostics + reports + telemetry
node bin/devt-tools.cjs health [--repair]
node bin/devt-tools.cjs report window|generate
node bin/devt-tools.cjs token-report [--sessions=N --baseline=PATH --compare=PATH --regression --fail-on-regression]
node bin/devt-tools.cjs mcp-stats [--since=DATE --tool=NAME --workflow-id=ID --workflow-type=TYPE --phase=PHASE]
# Updates
node bin/devt-tools.cjs update check|status|local-version|install-type|dirty|clear-cache|changelog| Event | What it does |
|---|---|
SessionStart |
Registers commands, checks for updates, loads context |
Stop |
Cleans up workflow state |
SubagentStart |
Tracks agent dispatch |
SubagentStop |
Tracks agent completion |
PostToolUse |
Context monitoring + memory auto-index (debounced) |
PreToolUse |
Prompt guard (Write/Edit), pre-flight guard, bash safety guard (destructive rm, --no-verify, force-push, mass-discard) |
UserPromptSubmit |
Injects workflow context and statusline |
Profile control: see Configuration reference → Hook profile.
devt/
.claude-plugin/ Plugin manifest
bin/
devt-tools.cjs CLI entry point
devt-memory-mcp.cjs Vendored read-only MCP server (10 tools, JSON-RPC stdio)
modules/ init, state, config, model-profiles, setup, memory, preflight,
discovery, graphify, deferred, mcp-stats, token-report,
security, health, weekly-report, update, cli-args, io
commands/ Slash command entry points (32 files)
workflows/ Orchestration files (31 files)
agents/ Agent definitions (10 files; 3 agents bundle sub-skill subdirectories)
skills/ Skill libraries (16 directories)
hooks/ Lifecycle hook scripts + hooks.json
guardrails/ Protective guidelines
references/ Technique libraries (questioning guide, domain probes, council offramp)
scripts/ smoke-test.sh, test-locking.cjs, extract-changelog.sh
templates/ Project templates (python-fastapi, go, typescript-node, vue-bootstrap, blank)
+ memory/ (ADR/CON/FLOW/REJ/LES frontmatter scaffolds)
.github/workflows/ CI: smoke-test on Node 22/24, version coherence,
CHANGELOG coverage, tag-driven GitHub releases
skill-index.yaml Agent-to-skill mapping
Workflow fails or gets stuck:
/devt:status— see current state/devt:forensics— post-mortem investigation/devt:cancel-workflow— reset and start over- Check
.devt/state/for artifact details
Plugin health issues:
/devt:health— diagnose (21 checks across config, state, hooks, memory)/devt:health --repair— auto-fix safe issues
Missing .devt/rules/:
/devt:init— set up project conventions
Agent returns BLOCKED:
- Read agent's output in
.devt/state/<phase>-summary.md— task may need to be broken down or clarified
Memory layer not surfacing expected docs:
node bin/devt-tools.cjs memory validate— check frontmatter / stale paths / broken linksnode bin/devt-tools.cjs memory index— rebuild the FTS5 index/devt:health— surfacesMEM_INDEX_STALE,MEM_PATH_UNREACHABLE,MEM_VALIDATE_ERRORS,MEM_CONFLICT_HIGH
MCP server warnings (Missing environment variables: CLAUDE_PLUGIN_ROOT, unknown command 'mcp'):
- Already fixed in current versions. Update via
/devt:update.
docs/MEMORY.md— comprehensive memory-layer guide (frontmatter reference, authoring conventions, troubleshooting)docs/COMMANDS.md— full command referenceguardrails/golden-rules.md— Rules 14 (Pre-Flight Protocol) and 15 (Memory Maintenance)skills/memory-pre-flight/SKILL.md— the protocol skill loaded by all 8 dev agentsskills/memory-curation/SKILL.md— the curator's promotion gatetemplates/memory/— ADR/CON/FLOW/REJ/LES scaffolds for new docs- CHANGELOG.md — full version history
GitHub Actions runs scripts/smoke-test.sh (260+ assertions across all CLI subcommands) and scripts/test-locking.cjs (20-worker concurrent state-write test) on every push. Version coherence, CHANGELOG coverage, and workflow_type registry coverage are enforced. Releases are tag-driven — push vX.Y.Z to fire .github/workflows/release.yml which extracts the matching CHANGELOG section into the GitHub release notes.
/devt:updatedevt checks for new versions on GitHub at each session start. The /devt:update command auto-detects how devt was installed (plugin system or git clone) and runs the right update command. Restart your Claude Code session after updating.
Manual update: cd ~/.devt && git pull origin main.
Releases are published at emrecdr/devt/releases. Each version follows Semantic Versioning and has a matching ## [X.Y.Z] section in CHANGELOG.md, formatted per Keep a Changelog.
The release flow is tag-driven: pushing a vX.Y.Z tag triggers .github/workflows/release.yml, which extracts the changelog section via scripts/extract-changelog.sh and creates the GitHub release automatically. CI enforces that VERSION, plugin.json version, and the changelog all stay in lock-step — a version bump without a matching changelog entry fails the build.
MIT