Testbench

Kubernetes-native agent evaluation system that executes test datasets via the A2A protocol, scores responses with pluggable metrics (RAGAS by default), and publishes scores via OpenTelemetry.

📖 Documentation: https://docs.agentic-layer.ai/testbench/

Run standalone

For evaluating an agent without deploying into Kubernetes / Testkube:

pip install agentic-layer-testbench
testworkflow config.yaml

See config.example.yaml for the available configuration options.

Development

Prerequisites

Python
uv
Tilt and a local Kubernetes cluster (e.g. kind)
Testkube CLI
GOOGLE_API_KEY for LLM-as-a-judge evaluation via Gemini

Build and run locally

# Install Python dependencies
uv sync
# Provide the LLM-as-a-judge API key
echo "GOOGLE_API_KEY=<key>" > .env
# Start the local stack (AI gateway, OTLP collector, sample agents, Testkube)
tilt up

Test

uv run poe ruff      # format and lint
uv run poe mypy      # static type checking
uv run poe bandit    # security scanning
uv run poe test      # unit tests
uv run poe check     # all of the above
uv run poe test_e2e  # E2E tests (requires `tilt up`)

E2E defaults target the Tilt environment. Override with E2E_DATASET_URL, E2E_AGENT_URL, E2E_MODEL if needed.

Verify the local deploy

Run the example workflow against the sample weather agent:

kubectl testkube run tw example-workflow --watch

The full walkthrough — defining experiments, configuring metrics, viewing reports — is in the first-workflow how-to.

Contributing

See the Contribution Guide.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github		.github
deploy/local		deploy/local
docs		docs
examples		examples
operator		operator
testbench		testbench
tests		tests
tests_e2e		tests_e2e
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Tiltfile		Tiltfile
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Testbench

Run standalone

Development

Prerequisites

Build and run locally

Test

Verify the local deploy

Contributing

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Testbench

Run standalone

Development

Prerequisites

Build and run locally

Test

Verify the local deploy

Contributing

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages