The UN Sustainable Development Goal (SDG) portal hosts critical Voluntary Local Reviews (VLRs), but data is fragmented across 50+ dynamic web pages. Manual tracking of new uploads is inefficient, inconsistent, and prone to human error.
This engine automates the extraction, synchronization, and monitoring of these records, providing a centralized, high-integrity dataset for policy analysts and researchers. It transforms a messy web interface into structured, actionable data.
- Scraping Engine: Headless Selenium for navigating JavaScript-heavy dynamic tables.
- Parsing Logic: BeautifulSoup4 for granular DOM traversal and data extraction.
- Persistence Layer: SQLite implementation to maintain state and perform delta-checks.
- Web Interface: Flask (Python) dashboard for real-time visualization.
- Automated Delta-Checking: System compares scraped data against the local SQLite master, ensuring only unique, new records are captured.
- Dynamic Pagination: Custom logic to handle "View More" triggers across 50+ pages of dynamic content.
- Clean Data Export: Normalizes fragmented web text into structured JSON/CSV outputs for AI training or analysis.
- Systemic Precision: Built with a "Mechanical Engineering" mindset—prioritizing fault tolerance and 100% data integrity.
git clone https://github.com/egbertdev/UN-VLR-Scraper.git
cd UN-VLR-Scraper
2. Install Dependencies
bash
pip install -r requirements.txt
3. Run the Engine
bash
python main.py
Usage
Once running, access the web dashboard at http://localhost:5000 to view synchronized VLR data in real-time. Scraped data is automatically stored in the SQLite database and can be exported as JSON or CSV from the interface.
Project Structure
text
UN-VLR-Scraper/
├── main.py # Flask application & orchestration
├── scraper.py # Selenium + BeautifulSoup scraping logic
├── database.py # SQLite operations & delta-checking
├── requirements.txt # Python dependencies
├── templates/ # HTML templates for dashboard
└── data/ # Exported JSON/CSV files
Author
Egbert Joel Abok
Focus: AI-Augmented Data Extraction & Automation Architecture