| title | Document AI Analyst |
|---|---|
| emoji | 🧠 |
| colorFrom | indigo |
| colorTo | purple |
| sdk | docker |
| app_port | 7860 |
| pinned | true |
| license | mit |
| short_description | Enterprise Agentic RAG — upload PDFs and chat with AI |
██████╗ ██████╗ ███████╗ █████╗ ███████╗███████╗██╗███████╗████████╗ █████╗ ███╗ ██╗████████╗
██╔══██╗██╔══██╗██╔════╝ ██╔══██╗██╔════╝██╔════╝██║██╔════╝╚══██╔══╝██╔══██╗████╗ ██║╚══██╔══╝
██████╔╝██║ ██║█████╗ ███████║███████╗███████╗██║███████╗ ██║ ███████║██╔██╗ ██║ ██║
██╔═══╝ ██║ ██║██╔══╝ ██╔══██║╚════██║╚════██║██║╚════██║ ██║ ██╔══██║██║╚██╗██║ ██║
██║ ██████╔╝██║ ██║ ██║███████║███████║██║███████║ ██║ ██║ ██║██║ ╚████║ ██║
╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═╝╚══════╝╚══════╝╚═╝╚══════╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═╝
██████╗ █████╗ ██████╗
██╔══██╗██╔══██╗██╔════╝
██████╔╝███████║██║ ███╗
██╔══██╗██╔══██║██║ ██║
██║ ██║██║ ██║╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝
Upload · Embed · Retrieve · Chat — A production-grade AI document assistant built end-to-end with an agentic RAG pipeline, streaming responses, and per-user data isolation.
Features · Tech Stack · Getting Started · Architecture · RAG Pipeline · API Reference · Deployment · Contributing
Thanks to all the amazing people who have contributed to PDF-Assistant-RAG! 🎉
|
param20h 💻 🚇 📖 |
Yuvraj-Sarathe 💻 🧪 🔒 |
SatyamPrakash09 💻 🔒 ⚡ |
akmhatey-ai 💻 🐛 |
drishtisharma14052007-eng 💻 🎨 |
Pika-pika06 💻 |
algojogacor 💻 |
|
HirenGajjar 💻 |
Kaustub26Pvgda 💻 |
blinkerbit 💻 |
akshy-yy 💻 |
PDF-Assistant-RAG is a complete, production-ready AI document assistant that lets users upload complex PDFs, financial reports, legal contracts, and research papers — then chat with an AI that provides accurate, cited answers powered by a multi-stage Retrieval-Augmented Generation pipeline.
The system uses semantic search + cross-encoder reranking to find the most relevant document chunks, streams AI-generated answers token-by-token, and highlights exact source citations with page numbers — all inside a sleek Next.js UI with JWT-secured per-user data isolation.
graph TD
subgraph Frontend["Frontend (Next.js 16)"]
UI["Dashboard UI (React)"]
Chat["Chat Panel (SSE)"]
Viewer["PDF Viewer (iframe)"]
end
subgraph Backend["Backend (FastAPI 0.115+)"]
API["API Router (/api/v1)"]
Auth["Auth (JWT/bcrypt)"]
DB[(SQLite Metadata)]
subgraph RAG["RAG Pipeline"]
Upload["Ingestion Task (Chunking)"]
Embed["Local Embeddings (all-MiniLM-L6-v2)"]
Retriever["Two-Stage Retriever"]
Rerank["Cross-Encoder Reranker"]
Agent["Agent/Generator"]
end
end
subgraph Storage["Vector Storage"]
Chroma[(ChromaDB)]
end
subgraph External["External Services"]
HF["HuggingFace Inference API (Qwen 72B)"]
end
%% Frontend to Backend Connections
UI <-->|REST / Auth| API
Chat <-->|SSE Streaming| API
Viewer -->|Fetch PDF| API
%% Backend Internals
API <--> Auth
API <--> DB
API --> Upload
API <--> Retriever
API <--> Agent
%% RAG Ingestion Flow
Upload --> Embed
Embed -->|Store Vectors| Chroma
%% RAG Query Flow
Retriever -->|1. Semantic Search| Chroma
Retriever -->|2. Score & Sort| Rerank
Retriever -->|Context| Agent
%% External LLM Flow
Agent <-->|LLM Generation| HF
| Technology | Purpose | |
|---|---|---|
| all-MiniLM-L6-v2 | Local sentence embeddings | |
| ms-marco-MiniLM-L-6-v2 | Cross-encoder reranker | |
| Qwen2.5-72B-Instruct | LLM (HuggingFace Inference API) | |
| PyMuPDF + python-docx | Document parsing |
| Technology | Purpose | |
|---|---|---|
| Docker Multi-Stage | Containerized deployment | |
| GitHub Actions | CI pipeline (dev branch) | |
| Git LFS | Binary asset management | |
| HuggingFace Spaces | Production deployment |
|
|
|
PDF-Assistant-RAG/
│
├── backend/ # FastAPI + RAG server
│ ├── app/
│ │ ├── main.py # App entrypoint, middleware, static files
│ │ ├── config.py # Pydantic settings (env vars)
│ │ ├── database.py # SQLAlchemy async engine
│ │ ├── models.py # ORM models (User, Document, Message)
│ │ ├── schemas.py # Pydantic request/response schemas
│ │ ├── auth.py # JWT creation & verification
│ │ │
│ │ ├── routes/
│ │ │ ├── auth.py # POST /register, /login, /me
│ │ │ ├── documents.py # Upload, list, delete, retrieve
│ │ │ └── chat.py # Streaming chat + history
│ │ │
│ │ └── rag/
│ │ ├── agent.py # Main RAG orchestrator
│ │ ├── chunker.py # Recursive text splitter
│ │ ├── embeddings.py # SentenceTransformer wrapper
│ │ ├── vectorstore.py # ChromaDB collection manager
│ │ ├── retriever.py # Semantic search + reranking
│ │ └── prompts.py # System & user prompt templates
│ │
│ ├── requirements.txt
│ └── .env # Local env (never committed)
│
├── frontend/ # Next.js 16 App Router
│ └── src/
│ ├── app/
│ │ ├── layout.tsx # Root layout + fonts
│ │ ├── page.tsx # Landing / redirect
│ │ ├── login/ # Auth pages
│ │ ├── register/
│ │ └── dashboard/ # Main app page
│ │
│ ├── components/
│ │ ├── chat/
│ │ │ ├── ChatPanel.tsx # Chat UI + SSE streaming
│ │ │ ├── MessageBubble.tsx # User / assistant message
│ │ │ └── SourceCard.tsx # Citation cards
│ │ ├── document/ # Upload + sidebar components
│ │ └── layout/ # Navbar, sidebar shell
│ │
│ └── lib/
│ └── api.ts # Typed API client + SSE stream helper
│
├── .github/
│ ├── workflows/
│ │ ├── ci.yml # CI — runs on dev branch only
│ │ ├── deploy.yml # Docker build — main branch only
│ │ └── devsecops.yml # Security scans — main branch only
│ ├── ISSUE_TEMPLATE/ # Bug report & feature request forms
│ ├── pull_request_template.md # PR checklist
│ └── CODEOWNERS # Auto-review assignment
│
├── Dockerfile # Multi-stage: Node build → Python serve
├── docker-compose.yml # Local Docker stack
├── CONTRIBUTING.md # contributor guide
└── .env.example # Template for environment variables
git clone https://github.com/param20h/PDF-Assistant-RAG.git
cd PDF-Assistant-RAGcp .env.example backend/.envEdit backend/.env:
SECRET_KEY=your-strong-random-secret
DATABASE_URL=sqlite:///./data/app.db
HF_TOKEN=hf_your_huggingface_token_here
UPLOAD_DIR=./data/uploads
CHROMA_PERSIST_DIR=./data/chroma_dbGet your free HuggingFace token at huggingface.co/settings/tokens
Open two terminals:
# Terminal A — Backend
cd backend
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# → API running at http://localhost:8000
# → Swagger docs at http://localhost:8000/docs# Terminal B — Frontend
cd frontend
npm install
npm run dev
# → App running at http://localhost:3000docker compose up --build
# → Full stack at http://localhost:7860 ┌─────────────────────────────────────────────┐
│ PDF / DOCX Upload │
└───────────────────┬─────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ PyMuPDF / python-docx Parser │
│ (text extraction per page) │
└───────────────────┬─────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Recursive Character Text Splitter │
│ chunk_size=1000 | overlap=200 │
└───────────────────┬─────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ all-MiniLM-L6-v2 (local embeddings) │
│ 384-dim dense vectors │
└───────────────────┬─────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ ChromaDB — per-user persistent collection │
└─────────────────────────────────────────────┘
── At Query Time ──
User Question ──▶ Embed ──▶ Semantic Search (Top-K=10)
│
▼
Cross-Encoder Reranker (Top-K=5)
ms-marco-MiniLM-L-6-v2
│
▼
Prompt Assembly (system + context + question)
│
▼
Qwen2.5-72B-Instruct (HF Inference API)
│
▼
Streamed SSE tokens ──▶ Frontend ChatPanel
| Method | Endpoint | Auth | Description |
|---|---|---|---|
POST |
/api/v1/auth/register |
❌ | Create a new user account |
POST |
/api/v1/auth/login |
❌ | Login and receive JWT token |
GET |
/api/v1/auth/me |
✅ | Get current user profile |
POST |
/api/v1/documents/upload |
✅ | Upload PDF/DOCX and trigger indexing |
GET |
/api/v1/documents |
✅ | List all documents for current user |
DELETE |
/api/v1/documents/{id} |
✅ | Delete a document and its vector data |
POST |
/api/v1/chat/ask/stream |
✅ | Ask a question (SSE streaming response) |
GET |
/api/v1/chat/history/{doc_id} |
✅ | Get chat history for a document |
DELETE |
/api/v1/chat/history/{doc_id} |
✅ | Clear chat history for a document |
GET |
/health |
❌ | Health check (db + chroma status) |
Full interactive docs available at
/docs(Swagger UI) when running locally.
| Variable | Required | Default | Description | Where to Get It |
|---|---|---|---|---|
SECRET_KEY |
✅ | — | JWT signing & session secret. Use a strong random string. | Generate: python -c "import secrets; print(secrets.token_urlsafe(32))" |
HF_TOKEN |
✅ | — | HuggingFace API token for LLM inference via Inference API. | huggingface.co/settings/tokens (free) |
ENVIRONMENT |
❌ | development |
Runtime mode. Set to production for deployment to lock CORS. |
— |
DEBUG |
❌ | False |
Enable debug mode with detailed error pages. Never enable in production. | — |
ALLOWED_ORIGINS |
❌ | http://localhost:3000,http://localhost:7860 |
Comma-separated CORS origins (only enforced in production). | Your deployed domain(s) |
DATABASE_URL |
❌ | sqlite:///./data/app.db |
SQLAlchemy database connection string. | SQLite (default), or your Postgres/MySQL connection string |
JWT_ALGORITHM |
❌ | HS256 |
JWT signing algorithm. | — |
JWT_EXPIRY_HOURS |
❌ | 72 |
JWT token lifetime in hours before re-login is required. | — |
UPLOAD_DIR |
❌ | ./data/uploads |
Local directory for storing uploaded documents. | — |
MAX_FILE_SIZE_MB |
❌ | 50 |
Maximum allowed upload file size in MB. | — |
ALLOWED_EXTENSIONS |
❌ | pdf,docx,txt,md |
Comma-separated list of permitted file extensions. | — |
CHROMA_PERSIST_DIR |
❌ | ./data/chroma_db |
Directory where ChromaDB persists its vector index. | — |
LLM_MODEL |
❌ | Qwen/Qwen2.5-72B-Instruct |
HuggingFace model ID for answer generation. | huggingface.co/models |
LLM_TEMPERATURE |
❌ | 0.3 |
LLM sampling temperature (0 = deterministic, 1 = creative). | — |
LLM_MAX_NEW_TOKENS |
❌ | 1024 |
Maximum tokens per LLM response. | — |
EMBEDDING_MODEL |
❌ | sentence-transformers/all-MiniLM-L6-v2 |
SentenceTransformer model for local embeddings (no external API). | huggingface.co/sentence-transformers |
EMBEDDING_DIMENSION |
❌ | 384 |
Embedding vector dimension (must match the model). | — |
RERANKER_MODEL |
❌ | cross-encoder/ms-marco-MiniLM-L-6-v2 |
Cross-encoder model for reranking retrieved chunks by relevance. | huggingface.co/cross-encoder |
CHUNK_SIZE |
❌ | 1000 |
Characters per document chunk. Larger = more context, smaller = better precision. | — |
CHUNK_OVERLAP |
❌ | 200 |
Overlap between consecutive chunks to maintain boundary context. | — |
TOP_K_RETRIEVAL |
❌ | 10 |
Candidate chunks retrieved from vector store during semantic search. | — |
TOP_K_RERANK |
❌ | 5 |
Final chunks passed to the LLM after reranking (must be ≤ TOP_K_RETRIEVAL). |
— |
| Command | Description |
|---|---|
uvicorn app.main:app --reload |
Start FastAPI with hot reload |
uvicorn app.main:app --port 8000 |
Start FastAPI on port 8000 |
| Command | Description |
|---|---|
npm run dev |
Start Next.js dev server |
npm run build |
Production build → out/ (static export) |
npm run lint |
Run ESLint |
| Command | Description |
|---|---|
docker compose up --build |
Build and start the full stack |
docker compose down |
Stop all containers |
This project is deployed on HuggingFace Spaces using Docker.
- Fork this repo and create a new Space at huggingface.co/new-space (SDK: Docker)
- Set the following Space secrets:
HF_TOKEN— your HuggingFace API tokenSECRET_KEY— a strong random string
- Push to the
hfremote — the Space will auto-build
git remote add hf https://<username>:<HF_TOKEN>@huggingface.co/spaces/<username>/<space-name>
git push hf maindocker compose up -d --build
# App available at http://your-server:7860This project is participating in GirlScript Summer of Code! We welcome contributors of all skill levels.
Branch Strategy:
| Branch | Purpose |
|---|---|
main |
Production — HuggingFace deployed (admin only) |
dev |
All contributor PRs target here |
feature/* / fix/* / docs/* |
Your working branches |
# Always branch from dev
git checkout -b feature/my-feature upstream/devQuick links:
Distributed under the MIT License. See LICENSE for more information.
Built with 💙 as a flagship AI engineering project
If you found this project helpful, please give it a ⭐ — it helps contributors discover it!