Skip to content

GatorSense/NeonCrops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeonCrops

Per-tree image crops from NEON airborne data across RGB, hyperspectral, and LiDAR modalities. Crops come out as species-labeled .npy arrays ready for model training.

Three modalities at the same 50 x 50 m extent around one OSBS oak.

The pipeline starts from a crown GeoPackage (or shapefile) of tree crown polygons. From there it figures out which NEON tiles to pull, downloads them, and crops one array per crown per modality per year.

Current scope: crown cropping from a curated crown file. Crown detection from VST stems via DeepForest will be added soon.

Where to start

Install

pip install -e .

Quickstart

The repo ships a tiny example crown file (30 OSBS crowns across 4 tiles) so you can walk through every step without any external download.

# 1. Build the tile manifest from the example crowns.
python -m neoncrops.build_tile_manifest \
    --crowns tests/data/example_crowns.gpkg \
    --years 2019 \
    --modalities rgb lidar \
    --out manifest.csv

# 2. Download the tiles (pulls from NEON; needs internet).
python -m neoncrops.download_tiles --manifest manifest.csv --out downloads/

# 3. Index downloaded files, plan the crops, run them.
python -m neoncrops.build_tile_index --root downloads/ --out tile_index.csv
python -m neoncrops.build_crop_plan --crowns tests/data/example_crowns.gpkg --tile-index tile_index.csv --out crop_plan.csv
python -m neoncrops.run_crops --plan crop_plan.csv --out crops/
python -m neoncrops.build_dataset --crop-log crops/crop_log.csv --crowns tests/data/example_crowns.gpkg --out dataset.csv

To run the pipeline on your own crowns, swap the example file for your own crown GeoPackage or shapefile. See docs/01_input_format.md for the column requirements. To load a crown file programmatically (with column normalization, CRS handling, and shapefile aliasing), use neoncrops.read_crowns.

Full crown dataset

A curated, species-labeled crown set covering 38 NEON sites (41,738 crowns, 234 species) is published on Hugging Face as weecology/neon-tree-crowns-dta. It combines algorithmic DeepTreeAttention crowns with hand-annotated bounding boxes and polygons. Pull it with:

python -m neoncrops.fetch_crowns --out-dir data/

This writes data/neon_crowns_dta.gpkg, which you can hand to build_tile_manifest and build_crop_plan exactly like the example file. See docs/data_sources.md for the schema.

Crowns per year

Each crown carries a year (detection_year for algorithmic, anno_year for hand-annotated) that anchors it to a specific NEON flight. The pipeline can emit one crop per year per modality for each crown, so this table is also the upper bound on the per-year sample size you can build:

Year Algorithmic Hand bbox Hand polygon Total
2017 23 0 0 23
2018 943 798 742 2,483
2019 18,597 3,600 768 22,965
2020 2,125 678 0 2,803
2021 3,622 0 137 3,759
2022 6,945 0 739 7,684
2023 0 0 145 145
2024 0 0 339 339
year unknown (IFAS) 1,537 0 0 1,537
total 33,792 5,076 2,870 41,738

The 1,537 IFAS crowns have no recorded detection year; pair them with the NEON flight closest to their original survey date.

Pipeline

Step Module Doc
1 build_tile_manifest docs/02_tile_manifest.md
2 download_tiles docs/03_download.md
3 convert_hsi (only if you asked for HSI) docs/04_convert_hsi.md
4 build_tile_index docs/05_tile_index.md
5 build_crop_plan docs/06_crop_plan.md
6 run_crops docs/07_run_crops.md
7 build_dataset docs/08_build_dataset.md

Step 1 has an optional substep, fetch_availability, that caches NEON tile availability so the manifest knows which tiles actually exist on NEON servers. See docs/02_tile_manifest.md for when to use it.

For the NEON data products this pulls from see docs/data_sources.md.

Optional utilities

  • docs/harmonize_species.md: audit and collapse known NEON taxonID inconsistencies (subspecies vs species, numbering variants, genus-level codes) on your crown file before you run the pipeline.

License

MIT. See LICENSE.

Citation

If you use the bundled DTA crowns or this pipeline in published work, please cite:

Weinstein, B. G., Marconi, S., Zare, A., Bohlman, S. A., Singh, A., Graves, S. J., ... & White, E. P. (2024). Individual canopy tree species maps for the National Ecological Observatory Network. PLoS Biology, 22(7), e3002700.

See docs/acknowledgments.md for the full list of upstream data sources.