Disambiguator for various different identifier types used in University of Edinburgh Digital Libraries systems
Mike's original ERIC has been expanded to take into account
- LUNA URLs
- Archipelago Digital Objects
- Archipelago Cantloupe URLs
- IIIF Manifests
- ARKs
A massive thank you is due to him for getting us going.
Project layout:
app.pyis the Flask web app.scripts/contains operational/admin scripts.data/input/contains source input files such astinyurls.csv.data/output/contains generated CSV exports.data/logs/anddata/db/hold local runtime artifacts.docs/andarchive/keep notes and old snapshots out of the app root.
Useful commands:
python3 scripts/init_db.pycreates the current schema and seed rows.python3 scripts/get_data.py --resumecrawlsdams-live2oldest-first and resumes from the last completed source page usingdata/output/get_data_checkpoint.json.python3 scripts/get_data.py --clear-csvs --resumeclears the generated export CSVs but keeps the crawl checkpoint, so the next batch starts where the previous crawl left off.python3 scripts/get_data.py --freshclears the generated export CSVs and checkpoint before starting a new crawl from the beginning.python3 scripts/ingest_csv.pyis safe to rerun; it reuses existing objects and only fills in missing identifiers, dimensions, and ARKs.python3 scripts/ingest_luna_routes.pyloads the generated TinyURL route CSV into the database.