Skip to content

egbertdev/un_vr_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UN Data Synchronization Engine

Automated VLR Tracking & Delta-Check System

Live Demo | Portfolio

The Problem

The UN Sustainable Development Goal (SDG) portal hosts critical Voluntary Local Reviews (VLRs), but data is fragmented across 50+ dynamic web pages. Manual tracking of new uploads is inefficient, inconsistent, and prone to human error.

The Solution

This engine automates the extraction, synchronization, and monitoring of these records, providing a centralized, high-integrity dataset for policy analysts and researchers. It transforms a messy web interface into structured, actionable data.

Technical Architecture

  • Scraping Engine: Headless Selenium for navigating JavaScript-heavy dynamic tables.
  • Parsing Logic: BeautifulSoup4 for granular DOM traversal and data extraction.
  • Persistence Layer: SQLite implementation to maintain state and perform delta-checks.
  • Web Interface: Flask (Python) dashboard for real-time visualization.

Key Features

  • Automated Delta-Checking: System compares scraped data against the local SQLite master, ensuring only unique, new records are captured.
  • Dynamic Pagination: Custom logic to handle "View More" triggers across 50+ pages of dynamic content.
  • Clean Data Export: Normalizes fragmented web text into structured JSON/CSV outputs for AI training or analysis.
  • Systemic Precision: Built with a "Mechanical Engineering" mindset—prioritizing fault tolerance and 100% data integrity.

Installation & Setup

git clone https://github.com/egbertdev/UN-VLR-Scraper.git
cd UN-VLR-Scraper
2. Install Dependencies
bash
pip install -r requirements.txt
3. Run the Engine
bash
python main.py
Usage
Once running, access the web dashboard at http://localhost:5000 to view synchronized VLR data in real-time. Scraped data is automatically stored in the SQLite database and can be exported as JSON or CSV from the interface.

Project Structure
text
UN-VLR-Scraper/
├── main.py              # Flask application & orchestration
├── scraper.py           # Selenium + BeautifulSoup scraping logic
├── database.py          # SQLite operations & delta-checking
├── requirements.txt     # Python dependencies
├── templates/           # HTML templates for dashboard
└── data/                # Exported JSON/CSV files
Author
Egbert Joel Abok
Focus: AI-Augmented Data Extraction & Automation Architecture

About

United Nations Voluntary Review Scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors