Skip to content

xcerox/miniRAG

Repository files navigation

miniRAG

Minimal RAG (Retrieval-Augmented Generation) for the console. Uses Ollama for local LLM and embeddings, and pgvector for vector storage.

Licensed under GNU GPL v3 — free for learning and open source projects. Commercial use requires the derivative work to also be open source.

What problems does this solve?

Most organizations have knowledge trapped in documents — manuals, contracts, reports, policies — that nobody reads because finding the right answer takes too long. miniRAG lets you talk to those documents in plain language and get direct answers, without sending your data to any external service.

Concrete use cases:

  • HR & internal policies — "How many vacation days do I get after my first year?" instead of scrolling through a 40-page handbook
  • Legal & contracts — "What are the termination clauses in this agreement?" without reading the full document
  • Technical manuals — "What does error code E-401 mean and how do I fix it?" across hundreds of pages of documentation
  • Financial reports — "What was the operating margin in Q3 and what explains the variance?" from dense PDF reports
  • Customer support — Answer repetitive questions automatically from your own product documentation
  • Compliance & audits — "Does our policy cover this scenario?" with traceable answers pointing to the source fragment

Why local? Everything runs on your own machine. Documents never leave your infrastructure — no OpenAI, no cloud APIs, no data exposure.

Stack

  • LLM & Embeddings — Ollama (local)
  • Vector DB — PostgreSQL + pgvector (HNSW cosine index)
  • ORM — SQLAlchemy
  • PDF parsing — pdfplumber
  • Package manager — uv

Requirements

Setup

# 1. Pull the models in Ollama
ollama pull qwen2.5:14b
ollama pull nomic-embed-text

# 2. Start pgvector
docker compose up -d

# 3. Install dependencies
uv sync

# 4. Configure environment
cp .env.example .env
# edit .env as needed

Usage

# Index a file (.txt or .pdf)
uv run minirag -f document.pdf

# Ask a question
uv run minirag -q "What is the vacation policy?"

# Index and query in one command
uv run minirag -f document.pdf -q "What is the vacation policy?"

# Run diagnostics on a question
uv run minirag -d "What is the vacation policy?"

# Suppress pipeline output
uv run minirag -q "What is the vacation policy?" --no-verbose

Configuration

All settings are controlled via .env:

Variable Default Description
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
OLLAMA_LLM_MODEL qwen2.5:14b Model for generation
OLLAMA_EMBEDDING_MODEL nomic-embed-text Model for embeddings
EMBEDDING_DIM 768 Embedding dimensions
TEMPERATURE 0.1 LLM temperature
LANGUAGE es Prompt language (en or es)
N_RESULTS 6 Number of fragments to retrieve
MIN_SIMILARITY 0.5 Minimum cosine similarity threshold
CHUNK_SIZE 500 Characters per chunk
CHUNK_OVERLAP 50 Overlap between chunks
RETRIEVE_MODE simple Retrieval strategy (simple, hyde, multi)
MULTI_N 3 Number of query variants for multi mode
RERANKER_ENABLED false Enable reranker after retrieval
RERANKER_MODEL qllama/bge-reranker-v2-m3:f16 Reranker model via Ollama
RERANKER_TOP_N 6 Final fragments after reranking
RERANKER_CANDIDATES 20 Candidates fetched before reranking
POSTGRES_HOST localhost Database host
POSTGRES_PORT 5432 Database port
POSTGRES_DB minirag Database name
POSTGRES_USER minirag Database user
POSTGRES_PASSWORD minirag Database password

Project structure

miniRAG/
├── main.py               # CLI entry point
├── utils.py              # All core logic
│   ├── Config            # Env vars and constants
│   ├── ORM               # SQLAlchemy Doc model + pgvector schema
│   ├── Database          # index(), _query_vector(), get_collection()
│   ├── Ollama            # embedded(), call_llm(), rerank()
│   ├── Prompt            # build_prompt(), _load_templates()
│   ├── RAG pipeline      # rag(), retrieve(), diagnose_rag()
│   └── File ingestion    # load_file(), _chunk_text()
├── prompt_templates.json # Prompt templates (en/es)
├── docker-compose.yml
├── pyproject.toml
└── .env

RAG pipeline modes

SIMPLE — best for: technical docs, FAQs, manuals, direct questions

question ──► embed(question) ──► pgvector search ──► build_prompt ──► LLM ──► answer

HYDE — best for: narrative documents, open-ended questions, literary texts

                  ┌─► LLM (hypothetical answer) ──► embed(answer) ──┐
question ─────────┤                                                   ├──► pgvector search ──► build_prompt ──► LLM ──► answer
                  └───────────────────────────────────────────────────┘

MULTI — best for: ambiguous questions, mixed documents, when simple/hyde miss results

                  ┌─► variant 1 ──► embed ──► pgvector search ──┐
question ──► LLM ─┼─► variant 2 ──► embed ──► pgvector search ──┼──► merge & dedupe ──► build_prompt ──► LLM ──► answer
                  └─► variant N ──► embed ──► pgvector search ──┘

+ RERANKER (optional, any mode) — best for: noisy retrieval results or low similarity scores

retrieve() ──► top RERANKER_CANDIDATES ──► bge-reranker cosine ──► top RERANKER_TOP_N ──► build_prompt ──► LLM ──► answer

Reranker

The reranker uses qllama/bge-reranker-v2-m3 via Ollama as a cosine similarity scorer. This is not a true reranker — the model's classification head is lost in the GGUF conversion. It does however produce better relevance scores than nomic-embed-text because its vector space is trained specifically for query-document relevance.

To enable it, pull the model and set RERANKER_ENABLED=true in .env:

ollama pull qllama/bge-reranker-v2-m3:f16

Upgrading to a true reranker (HuggingFace)

For production-grade reranking, replace the body of rerank() in utils.py with the HuggingFace version documented in the function's docstring. It uses the classification head directly and produces a true relevance score per (question, fragment) pair.

Requirements:

uv add transformers torch

The function signature is identical — same inputs, same output — so it is a drop-in replacement. No other code changes needed.

About

Minimal local RAG system using Ollama, pgvector and HyDE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages