A research-grade AIS analytics framework for maritime anomaly detection and early-warning studies.
MCIS is a reproducible Python framework for analyzing AIS (Automatic Identification System) data to detect maritime behavioral anomalies around conflict-relevant events.
The current primary case study is Black Sea maritime dynamics around February 24, 2022 (T0).
- Event-aware Maritime Analytics for pre/post event behavior shifts.
- End-to-end Reproducibility from raw CSV to model-ready panels.
- Methodological Guardrails to reduce leakage and unsupported claims.
- Baseline-first Modeling before moving to heavier deep-learning approaches.
- Project Layout
- Installation
- Quick Start
- CLI Guide
- Output Artifacts
- Testing
- Research Guardrails
- Recommended Workflow
- Troubleshooting
- License
mcis/
βββ cli/ # Command-line entrypoints
β βββ run_pipeline.py # Load β clean β feature-engineer β aggregate
β βββ run_analysis.py # Event-study / ITS / DiD / Granger workflows
β βββ run_model.py # Anomaly & forecasting model workflows
βββ config/
β βββ settings.yaml # Central experiment/pipeline configuration
βββ data/
β βββ raw/ # Source AIS CSV files
β βββ interim/ # Cleaned intermediate artifacts
β βββ processed/ # Feature-engineered artifacts
β βββ aggregated/ # Panel-level Parquet outputs
βββ mcis/
β βββ compat.py # Dependency compatibility shims (NumPy, SciPy, statsmodels)
β βββ loader.py
β βββ cleaner.py
β βββ features.py
β βββ aggregator.py
β βββ validation.py
β βββ analysis/
β βββ models/
β βββ viz/
β βββ utils/
βββ notebook/
β βββ mcis_pipeline.ipynb # End-to-end Jupyter notebook
βββ outputs/ # Tables, metadata, model artifacts, reports
βββ tests/ # Pytest suite (376 tests)
βββ ROADMAP.md # Detailed technical roadmap and guardrails
βββ pyproject.toml
βββ requirements.txt
- Python 3.11+
pip
pip install -e .| Extra | Includes | Command |
|---|---|---|
dev |
pytest, coverage, notebooks | pip install -e ".[dev]" |
ml |
xgboost, torch, shap | pip install -e ".[ml]" |
geo |
geopandas, shapely, folium, plotly | pip install -e ".[geo]" |
all |
all optional groups | pip install -e ".[all]" |
Note
Some optional packages (especially torch, geopandas) may require longer installs and additional system libraries.
pytest tests/test_validation.py -vpython cli/run_pipeline.py \
--config config/settings.yaml \
--file data/raw/ais_blacksea_6m.csv \
--steps allpython cli/run_analysis.py \
--config config/settings.yaml \
--analyses event_study its \
--metrics vessel_count mean_sogpython cli/run_model.py \
--config config/settings.yaml \
--panel data/aggregated/panel_blacksea.parquet \
--models rolling_zscore ewma robust_mahalanobisjupyter notebook notebook/mcis_pipeline.ipynbThe notebook covers the full pipeline: data loading, cleaning, feature engineering, aggregation, event studies, ITS/Granger/DiD analysis, anomaly detection, forecasting, model cards, and interactive maps.
Purpose
- Load raw AIS CSV data
- Apply cleaning and quality-flagging
- Engineer vessel-level features
- Aggregate to grid/day and Black Sea/day panels
Example
python cli/run_pipeline.py \
--config config/settings.yaml \
--file data/raw/ais_blacksea_12m.csv \
--steps load,clean,features,aggregate \
--date-start 2021-08-24 \
--date-end 2022-08-24Purpose
- Run selected analyses (event study, ITS, DiD, Granger) by metric
- Save outputs as structured JSON/table artifacts
Example
python cli/run_analysis.py \
--config config/settings.yaml \
--panel data/aggregated/panel_blacksea.parquet \
--analyses event_study its granger \
--metrics vessel_count unique_mmsi mean_sogPurpose
- Train/evaluate temporal anomaly and forecasting-error models
- Generate model artifacts, model cards, and registry records
Example
python cli/run_model.py \
--config config/settings.yaml \
--panel data/aggregated/panel_blacksea.parquet \
--models rolling_zscore ewma robust_mahalanobis var_residualTypical outputs after successful execution:
data/interim/ais_blacksea_cleaned.parquetdata/processed/ais_blacksea_features.parquetdata/aggregated/panel_daily.parquetdata/aggregated/panel_blacksea.parquetoutputs/tables/*.jsonoutputs/metadata/*.jsonoutputs/models/*.jsonoutputs/models/*_model_card_*.mdoutputs/models/registry/registry_entries.json
Run all tests:
pytest tests/ -vRun with coverage:
pytest tests/ --cov=mcis --cov-report=term-missingRun core pipeline-module tests only:
pytest tests/test_loader.py tests/test_cleaner.py tests/test_features.py tests/test_aggregator.py -vMCIS includes a compatibility shim (mcis/compat.py) that patches breaking changes in commonly paired
versions of NumPy, SciPy, and statsmodels:
| Issue | Symptom | Fix |
|---|---|---|
np.MachAr removed in NumPy β₯2.0 |
AttributeError in statsmodels internals |
compat.py restores np.MachAr |
scipy.signal.signaltools._centered moved in SciPy β₯1.17 |
ImportError in statsmodels internals |
compat.py restores the import path |
The compat module is imported automatically at the top of all CLI entrypoints and package __init__.py files
so patches apply before any statsmodels-dependent code runs.
If you encounter missing-attribute errors from statsmodels, ensure import mcis.compat runs before
any statsmodels imports (lazy imports are used in mcis/analysis/its.py, mcis/analysis/did.py,
mcis/analysis/granger.py and optional geo deps in mcis/viz/maps.py).
MCIS intentionally enforces methodological constraints for research validity:
- No random train/test split β temporal split only.
- No leakage features in model inputs β e.g.,
days_to_t0,post_conflict. - Validity/claim consistency β inferential claims are restricted under non-empirical modes.
- Baseline-first strategy β interpretable methods before complex deep models.
For full policy details, see:
ROADMAP.mdconfig/settings.yamlmcis/validation.py
- Update
config/settings.yaml. - Run
pytest tests/test_validation.py -v. - Run
cli/run_pipeline.pyto generate data artifacts. - Run
cli/run_analysis.pyfor statistical outputs. - Run
cli/run_model.pyfor model outputs and model cards. - Run regression checks with
pytest tests/ --cov=mcis.
-
ModuleNotFoundError: No module named 'mcis'
Run editable install from repo root:pip install -e . -
SHAP not installedin model CLI
Install optional dependency:pip install shap>=0.44.0 -
Panel file not found
Run the pipeline first with--steps all. -
Missing model feature configuration
Checkmodel.features_to_useinconfig/settings.yaml. -
Memory pressure on large CSV files
Use--date-start,--date-end, and/or--limitfor development runs. -
AttributeError: module 'numpy' has no attribute 'MachAr'
Upgrade mismatch between NumPy β₯2.0 and statsmodels 0.14.x. Fixed automatically bymcis/compat.pyβ ensure you importmcis.compatbefore any statsmodels calls. -
ImportError: cannot import name '_centered' from 'scipy.signal.signaltools'
SciPy β₯1.17 moved this private function. Fixed automatically bymcis/compat.py. -
ModuleNotFoundError: No module named 'folium'or'plotly'
Install geo extras:pip install -e ".[geo]". The notebook and code gracefully degrade with clear error messages when these are missing. -
Notebook cells fail with stale imports after code changes
Restart the kernel (Kernel β Restart & Run All) to pick up updated.pyfiles.
MIT License.