Skip to content

variang/chat-with-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chat with SQL

A chatbot project that utilizes GPT-4o mini, Langchain, and SQLite to allow users to perform Q&A with SQL databases using natural language.

Architecture

Core Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                        USER INPUT                               │
│              (Natural Language Question)                        │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                           LLM                                   │
│           (OpenAI GPT-4o-mini via LangChain)                    │
│                                                                 │
│  • Load environment variables (API keys, config)                │
│  • Initialize ChatOpenAI with temperature=0.0                   │
│  • Parse user question                                          │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   SQL GENERATION                                │
│       (create_sql_query_chain or SQL Agent)                     │
│                                                                 │
│  Option 1: Direct Chain                                         │
│  • Generate optimized SELECT statement                          │
│  • Apply constraints (LIMIT, columns)                           │
│                                                                 │
│  Option 2: Intelligent Agent                                    │
│  • List available tables                                        │
│  • Fetch table schemas                                          │
│  • Generate multi-step SQL if needed                            │
│  • Validate syntax before execution                             │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   DATABASE QUERY                                │
│          (SQLite via SQLDatabase utility)                       │
│                                                                 │
│  • SQLite Database (sqldb.db)
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                      RESULTS                                    │
│              (Raw Query Output from DB)                         │
│                                                                 │
│  • Retrieved data rows                                          │
│  • Result metadata (row count, column info)                     │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   LLM FORMATTING                                │
│       (Answer formatting with PromptTemplate)                   │
│                                                                 │
│  Input: Question + SQL Query + Raw Results                      │
│  Process: LLM transforms raw data into natural language         │
│  Output: Clear, formatted answer for user                       │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    USER OUTPUT                                  │
│             (Formatted Response via Gradio UI)                  │
│                                                                 │
│  Display result to user in chat interface                       │
└─────────────────────────────────────────────────────────────────┘

Technology Stack

Layer Technology Purpose
LLM OpenAI GPT-4o-mini Language understanding & generation
Orchestration LangChain Chain management & tool integration
SQL Execution SQLAlchemy Database connection & query execution
Database SQLite Data storage (read-only)
UI Gradio Chat interface

Data Sources

  1. Pre-loaded SQL Database (data/sqldb.db)
    • Chinook sample database
    • Contains Artist, Album, Customer, Invoice, etc. tables

Execution Modes

Mode 1: SQL Query Chain (Simple queries)

  • Fast direct path: Question → SQL → Execute → Format
  • Best for straightforward Q&A
  • Lower token usage

Mode 2: SQL Agent (Complex queries)

  • Intelligent exploration: List tables → Get schema → Generate SQL
  • Error recovery and retry logic
  • Multi-step query support
  • Best for complex analytical questions

Security Note

⚠️ Always use READ-ONLY database connections. The system:

  • Restricts to SELECT statements only
  • Validates SQL syntax before execution
  • Limits table scope access
  • Prevents DELETE, UPDATE, INSERT, or CREATE operations

Prerequisites

System Dependencies (macOS)

brew install sqlite

Python Requirements

  • Python 3.11

Setup

  1. Clone the repository
git clone git@github.com:variang/chat-with-sql.git
cd chat-with-sql
  1. Create and activate virtual environment
python3.11 -m venv venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Set up environment variables

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL_NAME=gpt-4o-mini

Usage

Option 1: Gradio Web Interface (Recommended)

The easiest way to interact with the chatbot is through the Gradio web interface.

Quick Start

  1. Ensure you have the Chinook database set up:
sqlite3 data/sqldb.db
.read data/sql/Chinook_Sqlite.sql
.quit
  1. Run the Gradio application:
python app.py
  1. Access the web interface:
    • Open your browser and go to: http://localhost:7860
    • You should see the Chat with SQL interface

Features

  • Auto-Database Discovery: All .db files in the data/ folder are automatically discovered and listed
  • Database Selection: Choose from available databases using the dropdown selector
  • Execution Modes: Switch between "SQL Agent" (default, more intelligent) and "SQL Chain" (faster, simpler)
  • Chat Interface: Ask questions in natural language and get immediate responses
  • SQL Query Visibility: Toggle to show/hide the generated SQL queries
  • Chat History: View full conversation history with all questions and answers

Option 2: Jupyter Notebook

For development and testing, you can use the interactive Jupyter notebook.

  1. Start Jupyter:
jupyter notebook
  1. Open the notebook:

    • Navigate to notebooks/chat_with_sql.ipynb
    • Run cells sequentially to test different components
  2. Features:

    • Test SQL query chains
    • Test SQL agents
    • Explore database schemas
    • Experiment with custom prompts

Chatbot User Interface

Chat interface

Dataset

  • Diabetes dataset: Link
  • Cancer dataset: Link
  • Chinook database: Link

References

About

A chatbot project that utilizes GPT, Langchain, and SQLite to allow users to perform Q&A with SQL databases using natural language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors