Smart Document Processing System

Overview

This project is a full-stack document processing system that ingests business documents (Invoices & Purchase Orders), extracts structured data, validates it, and provides an interactive interface for review and correction.

The system is designed to handle real-world imperfect data, including OCR-based inputs.

Input Data

PDF documents (clean and semi-structured)
Images (including messy / OCR-like)
CSV files (structured)
TXT files (semi-structured)

Tech Stack

Frontend

React
Axios

Backend

Node.js
Express

Database

MongoDB (Mongoose)

OCR

Tesseract.js

Setup Instructions

1. Clone the repository

git clone https://github.com/laststonedjs/smart-document-system.git

2. Backend Setup
cd server
npm install

Create .env file:

MONGO_URI=mongodb_connection_string
PORT=5000

Run server:
npm run dev

3. Frontend Setup
cd client/smart-document
npm install
npm run dev

Frontend runs on:
http://localhost:5173

Backend runs on:
http://localhost:5000

API Endpoints Upload

POST /api/upload/pdf
POST /api/upload/image
POST /api/upload/txt
POST /api/upload/csv

Documents

GET /api/documents
POST /api/documents
PUT /api/documents/:id

Example Workflow

Upload document
System extracts raw text
Data is structured and validated
Issues are highlighted
User edits incorrect fields
Document is saved and marked as validated

Notes

Some test documents contain intentional errors
The system is designed to detect and report inconsistencies

Future Improvements

Due date parsing
Authentication system
Role-based review workflow
Better OCR accuracy tuning
Export (PDF, CSV, Excel)

AI Usage

AI tools (ChatGPT, Gemini AI) were used for:

Debugging
Googling things
Code optimization

All implementation details are fully understood.

Live App: (https://smart-document-system-zeta.vercel.app/)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
client/smart_document		client/smart_document
node_modules		node_modules
server		server
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Document Processing System

Overview

Input Data

Tech Stack

Frontend

Backend

Database

OCR

Setup Instructions

1. Clone the repository

Example Workflow

Notes

Future Improvements

AI Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smart Document Processing System

Overview

Input Data

Tech Stack

Frontend

Backend

Database

OCR

Setup Instructions

1. Clone the repository

Example Workflow

Notes

Future Improvements

AI Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages