Per-tree image crops from NEON airborne data across RGB, hyperspectral, and LiDAR modalities. Crops come out as species-labeled .npy arrays ready for model training.
The pipeline starts from a crown GeoPackage (or shapefile) of tree crown polygons. From there it figures out which NEON tiles to pull, downloads them, and crops one array per crown per modality per year.
Current scope: crown cropping from a curated crown file. Crown detection from VST stems via DeepForest will be added soon.
- docs/00_overview.md: what NEON publishes and why this pipeline exists.
- docs/pipeline.md: the end-to-end walkthrough with copy-paste commands.
- docs/01_input_format.md: bringing your own crown file.
pip install -e .The repo ships a tiny example crown file (30 OSBS crowns across 4 tiles) so you can walk through every step without any external download.
# 1. Build the tile manifest from the example crowns.
python -m neoncrops.build_tile_manifest \
--crowns tests/data/example_crowns.gpkg \
--years 2019 \
--modalities rgb lidar \
--out manifest.csv
# 2. Download the tiles (pulls from NEON; needs internet).
python -m neoncrops.download_tiles --manifest manifest.csv --out downloads/
# 3. Index downloaded files, plan the crops, run them.
python -m neoncrops.build_tile_index --root downloads/ --out tile_index.csv
python -m neoncrops.build_crop_plan --crowns tests/data/example_crowns.gpkg --tile-index tile_index.csv --out crop_plan.csv
python -m neoncrops.run_crops --plan crop_plan.csv --out crops/
python -m neoncrops.build_dataset --crop-log crops/crop_log.csv --crowns tests/data/example_crowns.gpkg --out dataset.csvTo run the pipeline on your own crowns, swap the example file for your own crown GeoPackage or shapefile. See docs/01_input_format.md for the column requirements. To load a crown file programmatically (with column normalization, CRS handling, and shapefile aliasing), use neoncrops.read_crowns.
A curated, species-labeled crown set covering 38 NEON sites (41,738 crowns, 234 species) is published on Hugging Face as weecology/neon-tree-crowns-dta. It combines algorithmic DeepTreeAttention crowns with hand-annotated bounding boxes and polygons. Pull it with:
python -m neoncrops.fetch_crowns --out-dir data/This writes data/neon_crowns_dta.gpkg, which you can hand to
build_tile_manifest and build_crop_plan exactly like the example file.
See docs/data_sources.md for the schema.
Each crown carries a year (detection_year for algorithmic, anno_year for
hand-annotated) that anchors it to a specific NEON flight. The pipeline can
emit one crop per year per modality for each crown, so this table is also the
upper bound on the per-year sample size you can build:
| Year | Algorithmic | Hand bbox | Hand polygon | Total |
|---|---|---|---|---|
| 2017 | 23 | 0 | 0 | 23 |
| 2018 | 943 | 798 | 742 | 2,483 |
| 2019 | 18,597 | 3,600 | 768 | 22,965 |
| 2020 | 2,125 | 678 | 0 | 2,803 |
| 2021 | 3,622 | 0 | 137 | 3,759 |
| 2022 | 6,945 | 0 | 739 | 7,684 |
| 2023 | 0 | 0 | 145 | 145 |
| 2024 | 0 | 0 | 339 | 339 |
| year unknown (IFAS) | 1,537 | 0 | 0 | 1,537 |
| total | 33,792 | 5,076 | 2,870 | 41,738 |
The 1,537 IFAS crowns have no recorded detection year; pair them with the NEON flight closest to their original survey date.
| Step | Module | Doc |
|---|---|---|
| 1 | build_tile_manifest |
docs/02_tile_manifest.md |
| 2 | download_tiles |
docs/03_download.md |
| 3 | convert_hsi (only if you asked for HSI) |
docs/04_convert_hsi.md |
| 4 | build_tile_index |
docs/05_tile_index.md |
| 5 | build_crop_plan |
docs/06_crop_plan.md |
| 6 | run_crops |
docs/07_run_crops.md |
| 7 | build_dataset |
docs/08_build_dataset.md |
Step 1 has an optional substep, fetch_availability, that caches NEON tile availability so the manifest knows which tiles actually exist on NEON servers. See docs/02_tile_manifest.md for when to use it.
For the NEON data products this pulls from see docs/data_sources.md.
- docs/harmonize_species.md: audit and collapse known NEON
taxonIDinconsistencies (subspecies vs species, numbering variants, genus-level codes) on your crown file before you run the pipeline.
MIT. See LICENSE.
If you use the bundled DTA crowns or this pipeline in published work, please cite:
Weinstein, B. G., Marconi, S., Zare, A., Bohlman, S. A., Singh, A., Graves, S. J., ... & White, E. P. (2024). Individual canopy tree species maps for the National Ecological Observatory Network. PLoS Biology, 22(7), e3002700.
See docs/acknowledgments.md for the full list of upstream data sources.
