BenchLocal

Test LLMs on real tasks. Compare models side-by-side.

Website · Download · Watch demo · Build a Bench Pack

BenchLocal is a local-first desktop app for running, comparing, and managing installable LLM Bench Packs against local or remote models.

Official Bench Packs today:

BenchLocal owns the shared desktop runtime:

provider configuration
model registry
Bench Pack install and update flow
per-tab sampling overrides
run execution and result history
verifier lifecycle management
persisted desktop UI state

Agent access

BenchLocal can expose a local agent surface so AI agents and automation tools can control benchmark workflows while the desktop UI stays live.

Enable it from Settings > Agent Access. The app will show:

a bearer token
the local Agent Guide URL
the OpenAPI URL
the MCP Streamable HTTP URL

The HTTP API uses JSON commands for actions such as listing Bench Packs, managing providers and models, creating tabs, selecting models, refreshing availability, starting runs, resuming runs, retrying results, and stopping active runs. Live progress is available through Server-Sent Events at /v1/events.

MCP-capable agents can connect to /mcp with the same bearer token and use standard benchlocal_* tools plus BenchLocal state resources. This is the preferred integration path for agents that support tool calls.

See docs/agent-control-api.md for endpoint details, MCP tools/resources, safety rules, and the extension checklist for adding future UI features to the agent surface.

Each Bench Pack owns its benchmark behavior:

scenario definitions
benchmark-specific prompts
scoring logic
verifier contracts where required
benchmark-specific traces and summaries

Repo layout

app/ Electron app shell, desktop UI, main process, preload, renderer
packages/benchlocal-core shared protocol, config, workspace, and theme types
packages/benchlocal-sdk authoring helpers for Bench Pack repos
packages/benchpack-host host-side install, inspection, verifier, and run orchestration logic
themes/ built-in desktop themes
scripts/ local macOS release helpers
docs/ packaging and release docs

Developer references

Build commands

npm run build compile the app and workspace packages for development
npm run pack compile and package the production desktop app, including DMG and ZIP artifacts
npm run build:dir compile and produce an unpacked local app bundle
npm run build:win compile and package unsigned Windows NSIS and ZIP artifacts
npm run build:linux compile and package Linux AppImage and tar.gz artifacts
npm run release:all build the signed macOS release plus Windows and Linux desktop artifacts in one command

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
app		app
docs		docs
packages		packages
scripts		scripts
themes		themes
.env.release.example		.env.release.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BENCHLOCAL_REGISTRY_V1.md		BENCHLOCAL_REGISTRY_V1.md
BENCH_PACK_AUTHORING.md		BENCH_PACK_AUTHORING.md
BENCH_PACK_TEMPLATE.ts		BENCH_PACK_TEMPLATE.ts
BENCH_PROTOCOL_V1.md		BENCH_PROTOCOL_V1.md
CONFIG_SCHEMA_V1.md		CONFIG_SCHEMA_V1.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
package-lock.json		package-lock.json
package.json		package.json
screenshot.png		screenshot.png
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BenchLocal

Agent access

Repo layout

Developer references

Build commands

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BenchLocal

Agent access

Repo layout

Developer references

Build commands

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages