🧠 RAG Engine

A modular multimodal Retrieval-Augmented Generation (RAG) system built for scalable real-world AI applications.

Supports:

local knowledge retrieval
OCR ingestion
audio transcription
live web retrieval
hybrid multimodal querying

🚀 Overview

RAG Engine has evolved from a text-only prototype into a hybrid multimodal retrieval system capable of combining:

📄 uploaded documents
🖼️ OCR image extraction
🎤 audio transcription
🌐 live internet retrieval

through a unified semantic retrieval pipeline.

✨ Features

🔍 Hybrid Retrieval System

Semantic similarity search
FAISS vector database
Dynamic Top-K retrieval
Local + web retrieval routing
Structured retrieval context assembly

📄 Dynamic Upload System

Runtime document uploads
Automatic chunking
Embedding generation
Dynamic vector index rebuilding
Upload observability

🖼️ OCR Ingestion

Supports:

.png
.jpg
.jpeg

Powered by:

pytesseract
Pillow

Capabilities:

scanned text extraction
screenshot ingestion
printed English OCR

🎤 Audio Transcription

Supports:

.mp3
.wav
.m4a

Powered by:

Whisper (tiny)
FFmpeg

Capabilities:

speech-to-text ingestion
transcript indexing
audio-based retrieval

🌐 Live Web Retrieval

Implemented real-time internet augmentation pipeline.

Web Retrieval Flow

User Query
 ↓
Web Search
 ↓
URL Extraction
 ↓
Webpage Fetching
 ↓
Content Extraction
 ↓
Chunking
 ↓
Structured Web Context
 ↓
LLM Response

Stack Used

DDGS
Requests
Trafilatura

Features:

live internet retrieval
semantic webpage extraction
structured web context
web source tracking
retrieval observability

💬 Consumer-Grade AI UI

Frontend redesigned from:

developer utility UI

to:

modern AI assistant interface

Features:

chat-style interaction
modality-aware UI
upload progress tracking
web search configuration dialog
retrieval observability panels
source inspection system
interactive modality badges
slash command support

🧠 Architecture

FILE / WEB QUERY
        ↓
Extractor Router
 ├── TXT Extractor
 ├── OCR Extractor
 ├── Audio Extractor
 └── Web Retriever
        ↓
Normalized Text
        ↓
Chunking
        ↓
Embedding Generation
        ↓
FAISS Indexing
        ↓
Retriever Engine
        ↓
Structured Context
        ↓
LLM / Local Response

📁 Project Structure

RAG/
│
├── app/
│   │
│   ├── core/
│   │   ├── chunking.py
│   │   ├── embeddings.py
│   │   ├── retriever.py
│   │   └── retriever_engine.py
│   │
│   ├── ingestion/
│   │   ├── extractors/
│   │   │   ├── txt_extractor.py
│   │   │   ├── ocr_extractor.py
│   │   │   ├── audio_extractor.py
│   │   │   └── extractor_router.py
│   │   │
│   │   └── loader.py
│   │
│   ├── retrieval/
│   │   ├── web_search.py
│   │   ├── web_scraper.py
│   │   └── web_context_builder.py
│   │
│   ├── llm/
│   │   ├── base.py
│   │   └── gemini.py
│   │
│   ├── services/
│   │   ├── answer_engine.py
│   │   └── rag_pipeline.py
│   │
│   └── storage/
│       └── faiss_store.py
│
├── uploads/
├── data/
├── model/
│
├── index.html
├── main.py
├── requirements.txt
├── README.md
└── .gitignore

⚙️ Setup

1. Clone Repository

git clone https://github.com/your-username/rag-engine.git
cd rag-engine

2. Install Dependencies

pip install -r requirements.txt

3. Download Embedding Model

Download:

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Place inside:

model/all-MiniLM-L6-v2/

4. Configure Gemini API Key

GEMINI_API_KEY=your_api_key

5. Install FFmpeg

Required for audio transcription.

Add FFmpeg to system PATH.

6. Run Server

python main.py

🌐 Access UI

http://127.0.0.1:8000

📄 Supported Inputs

Documents

.txt
.md

OCR Images

.png
.jpg
.jpeg

Audio

.mp3
.wav
.m4a

Web

live web retrieval
semantic webpage extraction

🧠 Retrieval Capabilities

Current retrieval sources:

uploaded documents
OCR-extracted text
audio transcripts
live internet knowledge

🔌 Planned Upgrades

v1.0.5-beta

Planned:

metadata-aware retrieval
multilingual audio
reranking
confidence scoring
retrieval thresholding
caching
async retrieval
observability expansion

🚧 Current Limitations

OCR

no handwriting support
multilingual OCR still experimental
no layout preservation

Audio

English-only
no speaker diarization
no multilingual transcription

Web Retrieval

no reranking
no caching
temporary retrieval context only

📌 Tech Stack

Python
FastAPI
SentenceTransformers
FAISS
Gemini API
Whisper
Pytesseract
Trafilatura
HTML/CSS/JavaScript

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
app		app
frontend		frontend
.gitignore		.gitignore
Dare.md		Dare.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
index2.html		index2.html
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 RAG Engine

🚀 Overview

✨ Features

🔍 Hybrid Retrieval System

📄 Dynamic Upload System

🖼️ OCR Ingestion

🎤 Audio Transcription

🌐 Live Web Retrieval

Web Retrieval Flow

Stack Used

💬 Consumer-Grade AI UI

🧠 Architecture

📁 Project Structure

⚙️ Setup

1. Clone Repository

2. Install Dependencies

3. Download Embedding Model

4. Configure Gemini API Key

5. Install FFmpeg

6. Run Server

🌐 Access UI

📄 Supported Inputs

Documents

OCR Images

Audio

Web

🧠 Retrieval Capabilities

🔌 Planned Upgrades

v1.0.5-beta

🚧 Current Limitations

OCR

Audio

Web Retrieval

📌 Tech Stack

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages