Upload PDFs. Ask questions. Get grounded, page-cited answers. Runs 100% locally on a Raspberry Pi, no cloud required.
HOME NETWORK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Browser (phone / laptop / tablet)
https://192.168.x.x or https://localdocrag.local
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━
│ HTTPS
┌─────────────▼──────────────────────┐
│ Raspberry Pi 4/5 (Docker) │
│ │
│ ┌──────────────────────────────┐ │
│ │ nginx (ports 80/443) │ │
│ │ SSL termination │ │
│ │ Rate limiting (30 req/min) │ │
│ └─────────┬────────┬───────────┘ │
│ │ │ │
│ ┌─────────▼──┐ ┌──▼────────────┐ │
│ │ Frontend │ │ FastAPI │ │
│ │ React 18 │ │ Backend │ │
│ │ + Vite │ │ Port 8000 │ │
│ │ Port 80 │ └──────┬────────┘ │
│ └────────────┘ │ │
│ ┌────────▼────────┐ │
│ │ PostgreSQL 16 │ │
│ │ + pgvector │ │
│ │ Port 5432 │ │
│ └─────────────────┘ │
│ │
│ ┌──────────────────────────────┐ │
│ │ Ollama (optional) │ │
│ │ Port 11434 (internal only) │ │
│ │ qwen2.5:14b + nomic-embed │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
RAG Pipeline:
PDF Upload → pypdf parse → RecursiveCharacterTextSplitter (chunk=800, overlap=150)
→ OllamaEmbeddings / OpenAIEmbeddings → pgvector INSERT
Question → embed → pgvector cosine similarity search (TOP_K=5)
→ context + LLM prompt → ChatOllama / ChatOpenAI → answer + cited sources
| Layer | Technology |
|---|---|
| Backend | Python 3.11, FastAPI 0.111, uvicorn |
| Authentication | JWT (python-jose), bcrypt |
| Rate limiting | slowapi (FastAPI) + nginx |
| RAG pipeline | LangChain 0.2, LangChain-OpenAI, LangChain-Anthropic, LangChain-Community |
| Vector database | PostgreSQL 16 + pgvector |
| ORM | SQLAlchemy 2.0 (async) + psycopg3 |
| PDF parsing | pypdf 4.2 |
| Evaluation | RAGAS 0.1.9 (faithfulness, relevancy, precision, recall) |
| Frontend | React 18, Vite 5, Tailwind CSS 3, react-markdown |
| Proxy / SSL | nginx:alpine (self-signed TLS) |
| Containers | Docker + Docker Compose v3.9 |
| CI/CD | GitHub Actions (test + lint + docker build) |
Switch providers with a single env var — zero code changes:
LLM_PROVIDER |
Chat Model | Embeddings | Cost |
|---|---|---|---|
ollama |
qwen2.5:14b (or any Ollama model) | nomic-embed-text (768d) | Free |
openai |
gpt-4o-mini | text-embedding-3-small (1536d) | ~$0.001/query |
anthropic |
claude-haiku-4-5 | OpenAI (fallback) | ~$0.0008/query |
Raspberry Pi 5 16GB model guide:
| Model | Size | Best for |
|---|---|---|
qwen2.5:14b |
8.2 GB | Recommended — best document understanding |
deepseek-r1:14b |
8.5 GB | Analysis and reasoning tasks |
gemma2:27b:q4_K_M |
15 GB | Maximum quality (uses nearly all RAM) |
llama3.1:8b |
4.7 GB | Fast, well-tested fallback |
This section walks you through the complete setup from scratch using only Ollama (fully local, no OpenAI or Anthropic key required).
- Docker Desktop (Mac/Windows) or Docker Engine + Compose plugin (Linux/Pi)
- That's it — Python, Node.js, and Ollama are all handled inside Docker containers
git clone https://github.com/alinjfz/localdocrag.git
cd docmindYou need two values before you can start: a bcrypt password hash and a JWT secret key.
Option A — Using Docker (no Python needed locally):
# Generate a bcrypt hash for your chosen password (replace 'mypassword'):
docker run --rm python:3.11-slim \
python3 -c "import bcrypt; print(bcrypt.hashpw(b'mypassword', bcrypt.gensalt()).decode())"
# Generate a random JWT secret:
docker run --rm python:3.11-slim \
python3 -c "import secrets; print(secrets.token_hex(32))"Option B — If you have Python 3 installed locally:
pip install bcrypt # only needed if not already installed
python3 -c "import bcrypt; print(bcrypt.hashpw(b'mypassword', bcrypt.gensalt()).decode())"
python3 -c "import secrets; print(secrets.token_hex(32))"Both commands print one line of output each. Copy those values — you'll need them in the next step.
cp .env.example .envOpen .env in any text editor and set these values:
# Paste the bcrypt hash from Step 2:
APP_PASSWORD_HASH=$2b$12$...your_hash_here...
# Paste the random secret from Step 2:
JWT_SECRET_KEY=your64charhexstring
# Choose your password for the PostgreSQL database (anything you like):
DB_PASSWORD=pick_a_db_password
# Keep these as-is for Ollama local mode:
LLM_PROVIDER=ollama
OLLAMA_LLM_MODEL=llama3.2:3b # fast 2GB model, good for testing
OLLAMA_EMBED_MODEL=nomic-embed-text
EMBEDDING_DIMENSION=768
# Leave API keys blank:
OPENAI_API_KEY=
ANTHROPIC_API_KEY=Model size guide — change
OLLAMA_LLM_MODELto suit your machine:
llama3.2:3b— 2 GB RAM, fast, good for testingllama3.1:8b— 4.7 GB RAM, better qualityqwen2.5:14b— 8.2 GB RAM, best quality (Pi 5 16GB / desktop)
Also update DATABASE_URL to use the password you picked:
DATABASE_URL=postgresql+psycopg://localdocrag_user:pick_a_db_password@db:5432/localdocragdocker-compose --profile ollama up --buildThis starts 5 services: db (PostgreSQL + pgvector), backend (FastAPI), frontend (React), nginx (HTTPS proxy), and ollama.
Wait until you see lines like:
backend-1 | INFO: Application startup complete.
frontend-1 | /docker-entrypoint.sh: Configuration complete; ready for start up
nginx-1 | ...start worker processes
Leave this terminal running. Open a new terminal tab for the next steps.
In a new terminal tab, run:
# Pull the embedding model (~274 MB):
docker-compose --profile ollama exec ollama ollama pull nomic-embed-text
# Pull the chat model (replace with the model you chose in .env):
docker-compose --profile ollama exec ollama ollama pull llama3.2:3bThis downloads the models inside the Docker container. It takes a few minutes on first run. You only need to do this once — the models persist in a Docker volume.
Go to: https://localhost
Your browser will show a certificate warning ("Your connection is not private" / "Potential Security Risk"). This is expected — the app uses a self-signed certificate for local HTTPS.
- Chrome: Click "Advanced" → "Proceed to localhost (unsafe)"
- Firefox: Click "Advanced…" → "Accept the Risk and Continue"
- Safari: Click "Show Details" → "visit this website"
You'll see the LocalDocRAG login page.
- Username: whatever you set as
APP_USERNAMEin.env(default:admin) - Password: the plain-text password you used when generating the hash in Step 2 (e.g.
mypassword)
- Upload a PDF — click the upload zone in the sidebar, drag a PDF in, wait for "Processing complete"
- Ask a question — type a question about the document in the chat box and hit Enter
- You'll get an answer with page citations. The first query may be slow (~10–30 sec) while Ollama loads the model into memory. Subsequent queries are faster.
First, find your machine's local IP address:
# Mac:
ipconfig getifaddr en0
# Linux / Raspberry Pi:
hostname -I | awk '{print $1}'
# Windows (in PowerShell):
(Get-NetIPAddress -AddressFamily IPv4 -InterfaceAlias Wi-Fi).IPAddressThen on your phone or tablet (connected to the same Wi-Fi), open:
https://192.168.1.XXX (replace with your actual IP)
Accept the certificate warning the same way as in Step 6. The full app works from any device on your network.
| Feature | Ollama only |
|---|---|
| PDF upload + processing | Yes |
| Q&A with page citations | Yes |
| Document list / delete | Yes |
| Login / JWT auth | Yes |
RAGAS evaluation (/api/evaluate/) |
No — returns HTTP 400 with a clear message explaining that OpenAI is required as the judge LLM |
# Stop all containers (keeps data):
docker-compose --profile ollama down
# Restart later (models already pulled, no --build needed):
docker-compose --profile ollama up
# Stop and wipe all data (database + model volumes):
docker-compose --profile ollama down -v"connection refused" or blank page:
Check that all containers started: docker-compose --profile ollama ps. All should show Up or healthy.
Login fails with 401:
The password hash in .env doesn't match your password. Re-run the hash generation command from Step 2 and paste the new hash into .env, then restart: docker-compose --profile ollama restart backend.
Ollama query times out:
The model is loading. Wait ~30 seconds and try again. If it persists: docker-compose --profile ollama logs ollama to check for errors. Ensure you pulled the model that matches OLLAMA_LLM_MODEL in your .env.
"embedding dimension mismatch" error:
You changed OLLAMA_EMBED_MODEL or EMBEDDING_DIMENSION after uploading documents. Drop and recreate the chunks table: docker-compose --profile ollama down -v then re-upload your PDFs.
Browser keeps redirecting to HTTP:
Open https://localhost explicitly (with https://). The app redirects port 80 → 443 automatically.
If you have an API key, set these in .env and restart without the ollama profile:
LLM_PROVIDER=openai # or: anthropic
OPENAI_API_KEY=sk-... # your key
EMBEDDING_DIMENSION=1536 # OpenAI embeddings are 1536-dimensional
OLLAMA_LLM_MODEL= # not used in OpenAI modedocker-compose up --build # no --profile ollama neededFor Anthropic, also set OPENAI_API_KEY — Anthropic uses OpenAI's embedding API as a fallback when no native embedding model is specified.
Note:
OPENAI_API_KEYis also required if you want to use the RAGAS evaluation endpoint (/api/evaluate/), even whenLLM_PROVIDER=ollama, because RAGAS uses OpenAI as its judge LLM.
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
# Install docker-compose plugin
sudo apt-get install docker-compose-plugin# Clone the repo
git clone https://github.com/YOUR_USERNAME/localdocrag.git
cd localdocrag
# Set up .env (see Quick Start — Local Testing section above)
cp .env.example .env
nano .env # paste your bcrypt hash and JWT secret
# Start (Ollama profile for local models)
docker-compose --profile ollama up -d --build
# Pull Ollama models
docker-compose --profile ollama exec ollama ollama pull nomic-embed-text
docker-compose --profile ollama exec ollama ollama pull qwen2.5:14b # Pi 5 16GB
# Or for Pi 4 8GB: ollama pull llama3.2:3b# Find your Pi's IP address:
hostname -I
# Then on any device on your home Wi-Fi:
# https://192.168.1.XXX (replace with your Pi's IP)# On the Pi:
sudo apt install avahi-daemon
sudo systemctl enable --now avahi-daemon
# Now accessible at: https://localdocrag.local# Create a systemd service or add to /etc/rc.local:
cd /home/pi/docmind && docker-compose --profile ollama up -d| Variable | Default | Description |
|---|---|---|
APP_USERNAME |
admin |
Login username |
APP_PASSWORD_HASH |
— | bcrypt hash of your password |
JWT_SECRET_KEY |
— | Random 32+ char string for JWT signing |
JWT_EXPIRE_MINUTES |
480 |
Session duration (8 hours) |
LLM_PROVIDER |
ollama |
openai | anthropic | ollama |
OPENAI_API_KEY |
— | Required for OpenAI / RAGAS eval |
ANTHROPIC_API_KEY |
— | Required for Anthropic |
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama service URL |
OLLAMA_LLM_MODEL |
qwen2.5:14b |
Chat model name |
OLLAMA_EMBED_MODEL |
nomic-embed-text |
Embedding model name |
DATABASE_URL |
postgresql+psycopg://… | Async psycopg3 URL |
DB_PASSWORD |
localdocrag_secret_change_me |
PostgreSQL password |
CHUNK_SIZE |
800 |
Token target per chunk |
CHUNK_OVERLAP |
150 |
Overlap between chunks |
TOP_K_RETRIEVAL |
5 |
Number of chunks to retrieve |
MAX_FILE_SIZE_MB |
20 |
Max PDF upload size |
EMBEDDING_DIMENSION |
768 |
Must match model (768=Ollama, 1536=OpenAI) |
ALLOWED_ORIGINS |
https://localhost,… |
CORS origins (add your Pi's IP) |
All endpoints require Authorization: Bearer <token> except /api/auth/login.
Interactive docs: https://localhost/api/docs
| Method | Path | Description |
|---|---|---|
POST |
/api/auth/login |
Login → returns JWT |
// POST /api/auth/login
{ "username": "admin", "password": "yourpassword" }
// Response
{ "access_token": "eyJ…", "token_type": "bearer", "expires_in": 28800 }| Method | Path | Description |
|---|---|---|
POST |
/api/documents/upload |
Upload PDF (multipart/form-data) |
GET |
/api/documents/ |
List all documents |
DELETE |
/api/documents/{id} |
Delete document + all chunks |
// POST /api/documents/upload → 201
{
"id": "uuid",
"filename": "report.pdf",
"chunk_count": 42,
"created_at": "2026-03-28T10:00:00Z",
"embedding_provider": "ollama"
}| Method | Path | Description |
|---|---|---|
POST |
/api/query/ |
Ask a question, get cited answer |
// Request
{ "question": "What are the key findings?", "document_id": "uuid-or-null" }
// Response
{
"answer": "The key findings are… (Page 3)",
"sources": [
{ "content": "The study found…", "page_number": 3, "score": 0.92 }
],
"question": "What are the key findings?",
"document_id": "uuid"
}| Method | Path | Description |
|---|---|---|
POST |
/api/evaluate/ |
Run RAGAS evaluation |
// Request
{
"test_cases": [
{ "question": "What is X?", "ground_truth": "X is Y.", "document_id": "uuid" }
]
}
// Response
{
"metrics": {
"faithfulness": 0.91,
"answer_relevancy": 0.88,
"context_precision": 0.85,
"context_recall": 0.79
},
"num_samples": 1
}# Option 1: With the test compose database
docker-compose -f docker-compose.test.yml up -d
cd backend
DATABASE_URL=postgresql+psycopg://localdocrag_user:test_secret@localhost:5433/localdocrag_test \
pytest tests/ -v
docker-compose -f docker-compose.test.yml down
# Option 2: Tests mock all DB + LLM calls (no DB needed)
cd backend
pip install -r requirements.txt
pytest tests/ -v # uses os.environ defaults in conftest.pyTests cover:
- test_auth.py (5) — JWT login, wrong credentials, protected endpoints
- test_ingestion.py (5) — PDF parse, size limits, chunk count, batch embedding
- test_retrieval.py (5) — answer shape, no-context response, source scores, page numbers
- test_api.py (6) — all HTTP endpoints end-to-end with mocked services
RAGAS (Retrieval Augmented Generation Assessment) measures 4 quality metrics:
| Metric | Measures |
|---|---|
| Faithfulness | Does the answer only use information from the retrieved context? |
| Answer Relevancy | Is the answer relevant to the question? |
| Context Precision | Are the retrieved chunks actually useful for answering? |
| Context Recall | Were all relevant chunks retrieved? |
Note: RAGAS uses OpenAI as its judge LLM. Set
OPENAI_API_KEYin.enveven when usingLLM_PROVIDER=ollama.
Via the UI: Click "Run eval" in the chat panel, enter a test question and the expected answer.
Via API:
curl -X POST https://localhost/api/evaluate/ \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"test_cases": [{
"question": "What was the revenue in Q3?",
"ground_truth": "Revenue in Q3 was $4.2M, a 12% increase year-over-year.",
"document_id": "your-document-uuid"
}]
}'docmind/
├── backend/
│ ├── app/
│ │ ├── main.py ← FastAPI app, CORS, lifespan, rate limiting
│ │ ├── config.py ← Pydantic Settings (all env vars)
│ │ ├── database.py ← Async SQLAlchemy engine, ORM models, init_db()
│ │ ├── models.py ← Pydantic v2 request/response schemas
│ │ ├── routers/
│ │ │ ├── auth.py ← Login endpoint + get_current_user dependency
│ │ │ ├── documents.py ← Upload, list, delete documents
│ │ │ ├── query.py ← Q&A with source citations
│ │ │ └── evaluation.py← RAGAS evaluation endpoint
│ │ └── services/
│ │ ├── auth.py ← JWT create/verify, bcrypt password check
│ │ ├── llm_factory.py ← Provider abstraction (OpenAI/Anthropic/Ollama)
│ │ ├── ingestion.py ← PDF parse → chunk → embed → store pipeline
│ │ ├── retrieval.py ← Embed query → pgvector search → LLM answer
│ │ └── evaluation.py← RAGAS metrics computation
│ ├── tests/
│ │ ├── conftest.py ← Fixtures: mock DB, mock LLM, sample PDF, auth token
│ │ ├── test_auth.py
│ │ ├── test_ingestion.py
│ │ ├── test_retrieval.py
│ │ └── test_api.py
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── App.jsx ← Layout, routing, auth guard
│ │ ├── api/client.js ← Axios client with JWT interceptor
│ │ └── components/
│ │ ├── LoginPage.jsx
│ │ ├── UploadZone.jsx ← Drag-drop + XHR progress
│ │ ├── DocumentList.jsx ← Sidebar document list
│ │ ├── ChatInterface.jsx ← Q&A chat with citations
│ │ ├── SourceCard.jsx ← Retrieved chunk citation card
│ │ ├── EvalBadge.jsx ← RAGAS score display
│ │ └── ErrorAlert.jsx ← Auto-dismiss error banner
│ └── Dockerfile ← Multi-stage: node build → nginx serve
├── nginx/
│ ├── nginx.conf ← SSL, rate limiting, reverse proxy
│ └── Dockerfile ← Generates self-signed TLS certificate
├── init.sql ← Enables pgvector extension
├── docker-compose.yml ← Full stack (db, backend, frontend, nginx, ollama)
├── docker-compose.test.yml ← Test database only
├── .env.example
└── .github/workflows/ci.yml ← Test + lint + docker build
This project demonstrates the following skills relevant to AI Engineer / LLM Engineer roles:
RAG Pipeline Engineering
- Implemented end-to-end RAG: PDF ingestion → recursive chunking → batch embedding → pgvector storage → cosine similarity retrieval → grounded LLM generation
- Chunking strategy documented: RecursiveCharacterTextSplitter with configurable size/overlap, page number metadata preserved per chunk
LLM API Integration
- Multi-provider abstraction (
llm_factory.py): swap between OpenAI, Anthropic, and Ollama with a single env var - Async LangChain chains:
prompt | llmwithainvoke()for non-blocking generation - Local LLM inference on Raspberry Pi 5 16GB with Ollama (qwen2.5:14b)
Vector Database
- pgvector with IVFFlat index (cosine similarity operator
<=>) - Raw SQL similarity search with SQLAlchemy async engine — transparent, not a black box
- Configurable embedding dimensions (768 for local, 1536 for OpenAI)
Production FastAPI Backend
- Async SQLAlchemy with psycopg3 (not psycopg2)
- Pydantic v2 schemas for all I/O
- JWT authentication with bcrypt password hashing
- slowapi rate limiting (in-app) + nginx rate limiting (infrastructure)
- Global exception handler with consistent error shape
Evaluation & Quality
- RAGAS integration: faithfulness, answer relevancy, context precision, context recall
- All LLM calls mocked in tests — 100% reproducible CI without API keys
Infrastructure / DevOps
- Multi-stage Docker builds (ARM64-compatible for Raspberry Pi)
- nginx SSL termination with auto-generated self-signed certificate
- Docker Compose with health checks, internal networking, optional Ollama profile
- GitHub Actions CI: test → lint (ruff) → docker build
- Scanned PDFs: pypdf only extracts text from text-based PDFs. Scanned image PDFs require OCR (consider
pytesseract+pdf2image). - RAGAS requires OpenAI: Even in Ollama mode, RAGAS uses OpenAI as its judge LLM. This is a known limitation of the RAGAS framework.
- Embedding dimension change: Changing
EMBEDDING_DIMENSIONrequires dropping and recreating thedocument_chunkstable. Documents embedded with a different model must be re-ingested. - Single user: The current auth model is single-user (credentials in env vars). Multi-user would require a users table and registration flow.
- OCR support for scanned PDFs (pytesseract)
- Streaming responses (Server-Sent Events) for faster perceived latency on Pi
- Persistent chat history (conversations table in PostgreSQL)
- Multi-user support with registration
- Document classification endpoint (auto-tag uploaded documents by type)
- Let's Encrypt integration for a real TLS certificate on custom domains
- Hybrid search: combine BM25 keyword search with vector search (reciprocal rank fusion)