miniRAG

Minimal RAG (Retrieval-Augmented Generation) for the console. Uses Ollama for local LLM and embeddings, and pgvector for vector storage.

Licensed under GNU GPL v3 — free for learning and open source projects. Commercial use requires the derivative work to also be open source.

What problems does this solve?

Most organizations have knowledge trapped in documents — manuals, contracts, reports, policies — that nobody reads because finding the right answer takes too long. miniRAG lets you talk to those documents in plain language and get direct answers, without sending your data to any external service.

Concrete use cases:

HR & internal policies — "How many vacation days do I get after my first year?" instead of scrolling through a 40-page handbook
Legal & contracts — "What are the termination clauses in this agreement?" without reading the full document
Technical manuals — "What does error code E-401 mean and how do I fix it?" across hundreds of pages of documentation
Financial reports — "What was the operating margin in Q3 and what explains the variance?" from dense PDF reports
Customer support — Answer repetitive questions automatically from your own product documentation
Compliance & audits — "Does our policy cover this scenario?" with traceable answers pointing to the source fragment

Why local? Everything runs on your own machine. Documents never leave your infrastructure — no OpenAI, no cloud APIs, no data exposure.

Stack

LLM & Embeddings — Ollama (local)
Vector DB — PostgreSQL + pgvector (HNSW cosine index)
ORM — SQLAlchemy
PDF parsing — pdfplumber
Package manager — uv

Requirements

uv
Docker
Ollama running locally

Setup

# 1. Pull the models in Ollama
ollama pull qwen2.5:14b
ollama pull nomic-embed-text

# 2. Start pgvector
docker compose up -d

# 3. Install dependencies
uv sync

# 4. Configure environment
cp .env.example .env
# edit .env as needed

Usage

# Index a file (.txt or .pdf)
uv run minirag -f document.pdf

# Ask a question
uv run minirag -q "What is the vacation policy?"

# Index and query in one command
uv run minirag -f document.pdf -q "What is the vacation policy?"

# Run diagnostics on a question
uv run minirag -d "What is the vacation policy?"

# Suppress pipeline output
uv run minirag -q "What is the vacation policy?" --no-verbose

Configuration

All settings are controlled via .env:

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_LLM_MODEL`	`qwen2.5:14b`	Model for generation
`OLLAMA_EMBEDDING_MODEL`	`nomic-embed-text`	Model for embeddings
`EMBEDDING_DIM`	`768`	Embedding dimensions
`TEMPERATURE`	`0.1`	LLM temperature
`LANGUAGE`	`es`	Prompt language (`en` or `es`)
`N_RESULTS`	`6`	Number of fragments to retrieve
`MIN_SIMILARITY`	`0.5`	Minimum cosine similarity threshold
`CHUNK_SIZE`	`500`	Characters per chunk
`CHUNK_OVERLAP`	`50`	Overlap between chunks
`RETRIEVE_MODE`	`simple`	Retrieval strategy (`simple`, `hyde`, `multi`)
`MULTI_N`	`3`	Number of query variants for `multi` mode
`RERANKER_ENABLED`	`false`	Enable reranker after retrieval
`RERANKER_MODEL`	`qllama/bge-reranker-v2-m3:f16`	Reranker model via Ollama
`RERANKER_TOP_N`	`6`	Final fragments after reranking
`RERANKER_CANDIDATES`	`20`	Candidates fetched before reranking
`POSTGRES_HOST`	`localhost`	Database host
`POSTGRES_PORT`	`5432`	Database port
`POSTGRES_DB`	`minirag`	Database name
`POSTGRES_USER`	`minirag`	Database user
`POSTGRES_PASSWORD`	`minirag`	Database password

Project structure

miniRAG/
├── main.py               # CLI entry point
├── utils.py              # All core logic
│   ├── Config            # Env vars and constants
│   ├── ORM               # SQLAlchemy Doc model + pgvector schema
│   ├── Database          # index(), _query_vector(), get_collection()
│   ├── Ollama            # embedded(), call_llm(), rerank()
│   ├── Prompt            # build_prompt(), _load_templates()
│   ├── RAG pipeline      # rag(), retrieve(), diagnose_rag()
│   └── File ingestion    # load_file(), _chunk_text()
├── prompt_templates.json # Prompt templates (en/es)
├── docker-compose.yml
├── pyproject.toml
└── .env

RAG pipeline modes

SIMPLE — best for: technical docs, FAQs, manuals, direct questions

question ──► embed(question) ──► pgvector search ──► build_prompt ──► LLM ──► answer

HYDE — best for: narrative documents, open-ended questions, literary texts

                  ┌─► LLM (hypothetical answer) ──► embed(answer) ──┐
question ─────────┤                                                   ├──► pgvector search ──► build_prompt ──► LLM ──► answer
                  └───────────────────────────────────────────────────┘

MULTI — best for: ambiguous questions, mixed documents, when simple/hyde miss results

                  ┌─► variant 1 ──► embed ──► pgvector search ──┐
question ──► LLM ─┼─► variant 2 ──► embed ──► pgvector search ──┼──► merge & dedupe ──► build_prompt ──► LLM ──► answer
                  └─► variant N ──► embed ──► pgvector search ──┘

+ RERANKER (optional, any mode) — best for: noisy retrieval results or low similarity scores

retrieve() ──► top RERANKER_CANDIDATES ──► bge-reranker cosine ──► top RERANKER_TOP_N ──► build_prompt ──► LLM ──► answer

Reranker

The reranker uses qllama/bge-reranker-v2-m3 via Ollama as a cosine similarity scorer. This is not a true reranker — the model's classification head is lost in the GGUF conversion. It does however produce better relevance scores than nomic-embed-text because its vector space is trained specifically for query-document relevance.

To enable it, pull the model and set RERANKER_ENABLED=true in .env:

ollama pull qllama/bge-reranker-v2-m3:f16

Upgrading to a true reranker (HuggingFace)

For production-grade reranking, replace the body of rerank() in utils.py with the HuggingFace version documented in the function's docstring. It uses the classification head directly and produces a true relevance score per (question, fragment) pair.

Requirements:

uv add transformers torch

The function signature is identical — same inputs, same output — so it is a drop-in replacement. No other code changes needed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
prompt_templates.json		prompt_templates.json
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miniRAG

What problems does this solve?

Stack

Requirements

Setup

Usage

Configuration

Project structure

RAG pipeline modes

Reranker

Upgrading to a true reranker (HuggingFace)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

miniRAG

What problems does this solve?

Stack

Requirements

Setup

Usage

Configuration

Project structure

RAG pipeline modes

Reranker

Upgrading to a true reranker (HuggingFace)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages