Polyglot inference library for fully offline, text-only embedding and chat generation on CPU-only Linux, plus Windows and ARM64.
This repository hosts implementations in multiple languages. Java is first; Go follows. Both implementations produce wire-compatible artifacts and observable behavior.
| Language | Status | Path |
|---|---|---|
| Java | π§ in development (Phase 1) | java/ |
| Go | π planned | go/ |
Phase 1 is library-only β embedding via ONNX Runtime + bge-small-en-v1.5; chat generation via a forked llama.cpp Java binding + Qwen 2.5-0.5B-Instruct (default). HTTP/OpenAI-compatible layer is Phase 2.
docs/ARCHITECTURE.mdβ cross-language designdocs/WIRE_FORMAT.mdβ JSON shapes shared across languagesdocs/MODEL_REGISTRY.mdβ canonical model IDsjava/β Java implementation
Apache 2.0 β see LICENSE.