Test LLMs on real tasks. Compare models side-by-side.
Website · Download · Watch demo · Build a Bench Pack
BenchLocal is a local-first desktop app for running, comparing, and managing installable LLM Bench Packs against local or remote models.
Official Bench Packs today:
- ToolCall-15
- BugFind-15
- DataExtract-15
- InstructFollow-15
- ReasonMath-15
- StructOutput-15
- CLI-40
- HermesAgent-20
BenchLocal owns the shared desktop runtime:
- provider configuration
- model registry
- Bench Pack install and update flow
- per-tab sampling overrides
- run execution and result history
- verifier lifecycle management
- persisted desktop UI state
BenchLocal can expose a local agent surface so AI agents and automation tools can control benchmark workflows while the desktop UI stays live.
Enable it from Settings > Agent Access. The app will show:
- a bearer token
- the local Agent Guide URL
- the OpenAPI URL
- the MCP Streamable HTTP URL
The HTTP API uses JSON commands for actions such as listing Bench Packs, managing providers and models, creating tabs, selecting models, refreshing availability, starting runs, resuming runs, retrying results, and stopping active runs. Live progress is available through Server-Sent Events at /v1/events.
MCP-capable agents can connect to /mcp with the same bearer token and use standard benchlocal_* tools plus BenchLocal state resources. This is the preferred integration path for agents that support tool calls.
See docs/agent-control-api.md for endpoint details, MCP tools/resources, safety rules, and the extension checklist for adding future UI features to the agent surface.
Each Bench Pack owns its benchmark behavior:
- scenario definitions
- benchmark-specific prompts
- scoring logic
- verifier contracts where required
- benchmark-specific traces and summaries
app/Electron app shell, desktop UI, main process, preload, rendererpackages/benchlocal-coreshared protocol, config, workspace, and theme typespackages/benchlocal-sdkauthoring helpers for Bench Pack repospackages/benchpack-hosthost-side install, inspection, verifier, and run orchestration logicthemes/built-in desktop themesscripts/local macOS release helpersdocs/packaging and release docs
- ARCHITECTURE.md
- BENCH_PACK_AUTHORING.md
- BENCH_PROTOCOL_V1.md
- CONFIG_SCHEMA_V1.md
- BENCHLOCAL_REGISTRY_V1.md
- docs/agent-control-api.md
- docs/macos-release.md
- docs/windows-release.md
- docs/linux-release.md
npm run buildcompile the app and workspace packages for developmentnpm run packcompile and package the production desktop app, including DMG and ZIP artifactsnpm run build:dircompile and produce an unpacked local app bundlenpm run build:wincompile and package unsigned Windows NSIS and ZIP artifactsnpm run build:linuxcompile and package Linux AppImage and tar.gz artifactsnpm run release:allbuild the signed macOS release plus Windows and Linux desktop artifacts in one command
MIT