Stop burning cloud tokens on file reads. Offload to your local GPU.
Miser runs at localhost:7860 and intercepts the expensive parts of AI coding sessions:
- File ops, grep, tree → served instantly from disk, zero LLM, zero tokens
- Code gen, explain, fix, review → runs on your local Ollama model, zero API cost
Works with Claude Code, Codex CLI, Aider, or any agent that can call HTTP.
Every time your AI assistant reads a file to understand it, map its structure, or generate a test — it burns tokens:
| What your agent does | Tokens consumed |
|---|---|
Read a 300-line file to find one function |
~7 800 tokens |
Read a file to understand what it does |
~8 000 tokens |
| Generate a test suite for a module | ~3 000 tokens (output) |
| Read 5 files to map a project | ~35 000 tokens |
In a typical 2-hour Claude Code session, 40–60% of token spend is on these mechanical tasks.
Miser intercepts those calls and handles them locally:
| Expensive cloud call | Miser equivalent | Token cost |
|---|---|---|
Read(large_file) → ~8 000 tokens |
W.outline(path) |
~100 tokens |
| Read to find function | W.grep(path, "def fn") |
~50 tokens |
| Read to understand module | W.explain(path) |
$0.00 (local LLM) |
| Generate tests yourself | W.test(path) |
$0.00 (local LLM) |
| Analyze error + write fix | W.fix(error, code=...) |
$0.00 (local LLM) |
| Read 10 files to map project | W.tree(root, depth=3) |
~200 tokens |
Tested on Claude Code with qwen3.5:4b @ Ollama: saves ~16 000 tokens per session on a typical project.
git clone https://github.com/guyu-adam/miser.git
cd miser
bash install.sh # installs deps, pulls qwen3.5:4b, starts serviceThen in your Claude Code session (or any agent):
import sys; sys.path.insert(0, '/path/to/miser')
from client import W
W.outline("~/project/app.py") # function/class map — ~100 tokens
W.grep("~/project/app.py", "def auth") # find a function — ~50 tokens
W.explain("~/project/utils.py") # understand module — $0 (local)
W.fix("TypeError: NoneType", code="…") # debug — $0 (local)
W.test("~/project/utils.py") # write tests — $0 (local)Requirements: Python 3.10+, Ollama installed and running. GPU optional — works on CPU, just slower (4b model: ~40s on CPU, ~4s on GPU).
W.outline(path) # → "def foo [L12]\nclass Bar [L34]\n..."
W.grep(path, pattern, ctx=2) # → matching lines with context
W.tree(path, depth=2) # → directory tree string
W.exists(path) # → {"exists": True, "size_kb": 12}
W.run("git diff --stat") # → shell output
W.read(path) # → file content (use sparingly)
W.write(path, content) # → write file
W.patch(path, old, new) # → find-and-replace in fileW.explain(path_or_code) # plain-English explanation
W.fix(error_msg, code="...") # error message → suggested fix
W.test(path, function="parse") # generate pytest tests
W.review(path) # bug + improvement review
W.codegen("write RSI indicator") # code generation
W.summarize(path, focus="errors") # compress file to bullets
W.git_summary(path, n=10) # recent commits summary
W.ask("any freeform task") # general purposeresults = W.batch([
("outline", "~/project/app.py"),
("outline", "~/project/models.py"),
("run", "git status"),
("exists", "~/project/.env"),
])Add to your CLAUDE.md:
## Miser — Local Token-Saver (ALWAYS USE THIS)
Miser runs at http://localhost:7860.
import sys; sys.path.insert(0, '/path/to/miser')
from client import W
| Task | Do NOT do this | Do THIS instead |
|------|---------------|-----------------|
| Map file structure | Read(large_file) | W.outline(path) |
| Find one function | Read(large_file) | W.grep(path, "def fn") |
| Understand a module | Read + reason | W.explain(path) |
| Write tests | Generate yourself | W.test(path) |
| Debug error | Reason yourself | W.fix(error, code=ctx) |
| Multiple lookups | Sequential Reads | W.batch([...]) |Tested on a 1 500-line Python project, 90-minute coding session:
| Metric | Without Miser | With Miser | Savings |
|---|---|---|---|
| Tokens on file reads | ~42 000 | ~3 200 | -92% |
| Local LLM ops (tests, explain) | 0 (Claude generates) | 8 ops | -24 000 tokens |
| Total session tokens | ~68 000 | ~28 000 | -59% |
| Estimated cost (Claude Sonnet) | ~$0.20 | ~$0.08 | -$0.12/session |
Environment: Mac M2, qwen3.5:4b via Ollama. Results vary by project size and coding style.
ollama pull mistral:7b
MISER_MODEL=mistral:7b bash start.shTested models: qwen3.5:4b (default, fast), qwen3.5:latest (8B, smarter),
mistral:7b, llama3.1:8b, phi4, gemma3:4b, deepseek-coder:6.7b.
Claude Code / Aider / Codex
│ HTTP POST localhost:7860
▼
┌─────────────────────┐
│ miser.py │
│ ┌───────────────┐ │
│ │ Zero-LLM ops │──┼──► disk / shell (<50ms)
│ │ outline/grep/ │ │
│ │ tree/run/read │ │
│ └───────────────┘ │
│ ┌───────────────┐ │
│ │ Local-LLM ops │──┼──► Ollama API (4-40s, $0)
│ │ explain/fix/ │ │
│ │ test/codegen │ │
│ └───────────────┘ │
└─────────────────────┘
MIT