Skip to content

Hbasu5/rag-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 RAG Engine

A modular multimodal Retrieval-Augmented Generation (RAG) system built for scalable real-world AI applications.

Supports:

  • local knowledge retrieval
  • OCR ingestion
  • audio transcription
  • live web retrieval
  • hybrid multimodal querying

πŸš€ Overview

RAG Engine has evolved from a text-only prototype into a hybrid multimodal retrieval system capable of combining:

  • πŸ“„ uploaded documents
  • πŸ–ΌοΈ OCR image extraction
  • 🎀 audio transcription
  • 🌐 live internet retrieval

through a unified semantic retrieval pipeline.


✨ Features

πŸ” Hybrid Retrieval System

  • Semantic similarity search
  • FAISS vector database
  • Dynamic Top-K retrieval
  • Local + web retrieval routing
  • Structured retrieval context assembly

πŸ“„ Dynamic Upload System

  • Runtime document uploads
  • Automatic chunking
  • Embedding generation
  • Dynamic vector index rebuilding
  • Upload observability

πŸ–ΌοΈ OCR Ingestion

Supports:

  • .png
  • .jpg
  • .jpeg

Powered by:

  • pytesseract
  • Pillow

Capabilities:

  • scanned text extraction
  • screenshot ingestion
  • printed English OCR

🎀 Audio Transcription

Supports:

  • .mp3
  • .wav
  • .m4a

Powered by:

  • Whisper (tiny)
  • FFmpeg

Capabilities:

  • speech-to-text ingestion
  • transcript indexing
  • audio-based retrieval

🌐 Live Web Retrieval

Implemented real-time internet augmentation pipeline.

Web Retrieval Flow

User Query
 ↓
Web Search
 ↓
URL Extraction
 ↓
Webpage Fetching
 ↓
Content Extraction
 ↓
Chunking
 ↓
Structured Web Context
 ↓
LLM Response

Stack Used

  • DDGS
  • Requests
  • Trafilatura

Features:

  • live internet retrieval
  • semantic webpage extraction
  • structured web context
  • web source tracking
  • retrieval observability

πŸ’¬ Consumer-Grade AI UI

Frontend redesigned from:

  • developer utility UI

to:

  • modern AI assistant interface

Features:

  • chat-style interaction
  • modality-aware UI
  • upload progress tracking
  • web search configuration dialog
  • retrieval observability panels
  • source inspection system
  • interactive modality badges
  • slash command support

🧠 Architecture

FILE / WEB QUERY
        ↓
Extractor Router
 β”œβ”€β”€ TXT Extractor
 β”œβ”€β”€ OCR Extractor
 β”œβ”€β”€ Audio Extractor
 └── Web Retriever
        ↓
Normalized Text
        ↓
Chunking
        ↓
Embedding Generation
        ↓
FAISS Indexing
        ↓
Retriever Engine
        ↓
Structured Context
        ↓
LLM / Local Response

πŸ“ Project Structure

RAG/
β”‚
β”œβ”€β”€ app/
β”‚   β”‚
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ chunking.py
β”‚   β”‚   β”œβ”€β”€ embeddings.py
β”‚   β”‚   β”œβ”€β”€ retriever.py
β”‚   β”‚   └── retriever_engine.py
β”‚   β”‚
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   β”œβ”€β”€ extractors/
β”‚   β”‚   β”‚   β”œβ”€β”€ txt_extractor.py
β”‚   β”‚   β”‚   β”œβ”€β”€ ocr_extractor.py
β”‚   β”‚   β”‚   β”œβ”€β”€ audio_extractor.py
β”‚   β”‚   β”‚   └── extractor_router.py
β”‚   β”‚   β”‚
β”‚   β”‚   └── loader.py
β”‚   β”‚
β”‚   β”œβ”€β”€ retrieval/
β”‚   β”‚   β”œβ”€β”€ web_search.py
β”‚   β”‚   β”œβ”€β”€ web_scraper.py
β”‚   β”‚   └── web_context_builder.py
β”‚   β”‚
β”‚   β”œβ”€β”€ llm/
β”‚   β”‚   β”œβ”€β”€ base.py
β”‚   β”‚   └── gemini.py
β”‚   β”‚
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ answer_engine.py
β”‚   β”‚   └── rag_pipeline.py
β”‚   β”‚
β”‚   └── storage/
β”‚       └── faiss_store.py
β”‚
β”œβ”€β”€ uploads/
β”œβ”€β”€ data/
β”œβ”€β”€ model/
β”‚
β”œβ”€β”€ index.html
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── .gitignore

βš™οΈ Setup

1. Clone Repository

git clone https://github.com/your-username/rag-engine.git
cd rag-engine

2. Install Dependencies

pip install -r requirements.txt

3. Download Embedding Model

Download:

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Place inside:

model/all-MiniLM-L6-v2/

4. Configure Gemini API Key

GEMINI_API_KEY=your_api_key

5. Install FFmpeg

Required for audio transcription.

Add FFmpeg to system PATH.


6. Run Server

python main.py

🌐 Access UI

http://127.0.0.1:8000

πŸ“„ Supported Inputs

Documents

  • .txt
  • .md

OCR Images

  • .png
  • .jpg
  • .jpeg

Audio

  • .mp3
  • .wav
  • .m4a

Web

  • live web retrieval
  • semantic webpage extraction

🧠 Retrieval Capabilities

Current retrieval sources:

  • uploaded documents
  • OCR-extracted text
  • audio transcripts
  • live internet knowledge

πŸ”Œ Planned Upgrades

v1.0.5-beta

Planned:

  • metadata-aware retrieval
  • multilingual audio
  • reranking
  • confidence scoring
  • retrieval thresholding
  • caching
  • async retrieval
  • observability expansion

🚧 Current Limitations

OCR

  • no handwriting support
  • multilingual OCR still experimental
  • no layout preservation

Audio

  • English-only
  • no speaker diarization
  • no multilingual transcription

Web Retrieval

  • no reranking
  • no caching
  • temporary retrieval context only

πŸ“Œ Tech Stack

  • Python
  • FastAPI
  • SentenceTransformers
  • FAISS
  • Gemini API
  • Whisper
  • Pytesseract
  • Trafilatura
  • HTML/CSS/JavaScript

πŸ“„ License

MIT License


Β© 2026 β€” HARDIK BASU

About

A modular RAG pipeline with semantic retrieval, FAISS-based search, and extensible architecture for integrating LLMs and external data sources.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors