A Streamlit app for ingesting PDF bank statements, classifying spending, correcting categories, and exploring trends with interactive charts.
This project is a personal finance workflow for turning PDF bank statements into structured transactions, category mappings, grouped spending views, and chart-based analysis with a lightweight local SQLite store.
- Statement ingestion depends on extractable PDF text; scanned PDFs still need OCR before parsing can work well.
- LLM-backed parsing and categorisation depend on external provider credentials when you want automated classification.
- The app is designed for local use with a single-user SQLite database rather than concurrent multi-user access.
- Drag-and-drop PDF statement uploads for multiple bank accounts
- Automatic transaction extraction and storage in SQLite
- LLM-based statement parsing through LangChain
- LLM-first transaction categorisation for new merchants, with persistent mappings for known merchants
- Merchant/category mappings learned from both LLM classification and user corrections
- Filters by date, category, account, and transaction type
- Card-based grouped spending views by month, year, category, account, and more
- Interactive Plotly visualisations for spending over time
app.py: Streamlit home page with overview and grouped spending cardspages/: Streamlit pages for Upload, Mappings, and Chartsspending_tracker/db.py: SQLite persistencespending_tracker/parser.py: PDF statement parsingspending_tracker/categorizer.py: Mapping + LLM categorisation logicspending_tracker/analytics.py: Aggregation helpersspending_tracker/services.py: Statement ingestion workflowspending_tracker/config.py: Environment-based app and LLM configurationui/: Shared Streamlit helpers and page renderers
- Create a virtual environment.
- Install dependencies:
pip install -e ".[dev]"- Optionally configure LLM provider credentials:
export LLM_PROVIDER=openai
export OPENAI_API_KEY=your_key
export LLM_MODEL=gpt-4.1-miniOptional runtime configuration:
export SPENDING_TRACKER_DB_PATH=/absolute/path/to/spending_tracker.dbSupported LLM_PROVIDER values:
openaianthropicgoogle
If no provider is configured, uncategorised transactions remain for user review until you configure an LLM or map them manually.
- Start the app:
streamlit run app.pyLint the repo:
ruff check .Format the repo:
ruff format .- Statement ingestion now uses an LLM-based extractor by default. The app expects the model to return ISO-formatted transaction dates, with a light validator before records are stored.
- If the PDF is scanned and contains no extractable text, OCR is still required before the LLM has anything useful to parse.
- The first successful LLM category for a new merchant is stored as that merchant's reusable mapping, and user corrections can override it later.
- The local SQLite database is ignored by git and generated on demand. By default it lives at
spending_tracker.dbin the project root unlessSPENDING_TRACKER_DB_PATHis set.