Benchmark framework for evaluating hyperparameter optimization methods and overtuning mitigation strategies.
Tested and working with Python 3.11.3 on WSL (Windows Subsystem for Linux).
git clone URL (omitted for anonymity)
cd OvertuningBenchmarksudo apt-get update && sudo apt-get install -y \
build-essential \
cmake \
git \
swig \
git-lfs \
libeigen3-dev \
libopenblas-dev \
libffi-devpython3.11 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install "autogluon.tabular[fastai]" --extra-index-url https://download.pytorch.org/whl/cpu
pip install --no-cache-dir -r requirements.txt
pip install "gpytorch>=1.4.0"
pip install "pymoo>=0.6.0"
pip install "HEBO==0.3.5" --no-deps
pip install IPythonOvertuningBenchmark/
├── main.py
├── README.md
├── requirements.txt
├── docs/
│ ├── datasets.md
│ ├── normalization.md
│ ├── parallelization.md
│ └── preprocessing.md
├── src/
│ ├── datasets/
│ ├── evaluators/
│ ├── experiments/
│ ├── history/
│ ├── metrics/
│ ├── mitigations/
│ ├── models/
│ ├── optimizers/
│ ├── resamplers/
│ ├── search_space/
│ ├── target/
│ ├── utils/
│ └── visualizations/
└── tests/
- Minimal local profile:
adult.csv,ilpd.csv, and363700.csvinsrc/datasets/datasets/ - Full profile: all benchmark dataset CSVs pulled via Git LFS
The current test suite supports the minimal local profile and skips/tests-gates full-dataset-only checks.
source venv/bin/activateRun all tests:
python -m pytest -qExample: run LGBM on dataset 363700 (binary).
python -m src.experiments.benchmark_experiments.start_experiment \
--dataset_id 363700 \
--model_string LGBM \
--problem_type binary \
--output_dir . \
--n_jobs 1 \
--threads_per_job 1This creates a timestamped experiment folder under results, for example:
results/LGBM_363700_YYYYMMDD_HHMMSS/
Parallelism guidance: see docs/parallelization.md.
Since the experiments take long, we have added a sample of results for one dataset/model combination to the repository. Using those results, you should be able to run the preprocessing and visualization pipeline.
You can run both post-hoc methods for each experiment directory you want to include. The results for this are already included in the repo, so you do not need to rerun these to obtain some visualizations. If you actually run a full experiment using the command above, you can apply post hoc ensembling or post hoc surrogate incumbent selection using the commands below. Note that by default, results get stored in the /results folder, while the visualziation works on the results_prd folder. This is to preserve integrety of the /results_prd folder, so if you run this yourself, make sure to move the full experiment to the results_prd folder.
Single experiment:
python -m src.experiments.benchmark_experiments.post_hoc_ensemble_experiment \
--results_dir results/LGBM_363700 \
--ensemble_size 40 \
--n_jobs 1
python -m src.experiments.benchmark_experiments.post_hoc_surrogate_experiment \
--results_dir results/LGBM_363700 \
--n_jobs 1These create per-experiment files such as:
trajectories_post_hoc_ensemble.csvresults_post_hoc_ensemble.csvtrajectories_post_hoc_surrogate.csvresults_post_hoc_surrogate.csv
The preprocessing pipeline scans all folders directly inside results_prd/ and writes combined CSVs to src/visualizations/data.
python -m src.visualizations.preprocessing.pipelineNotes: Please do not use --recalculate as it collects raw results, which are not present because of their large size. Instead, just run the command above to create combined files.
--recalculaterecomputes per-experiment preprocessing from raw artifacts.- Without
--recalculate, existing preprocessed per-experiment files are reused. - Post-hoc files are aggregated automatically when present.
Generate core plots:
python -m src.visualizations.plotting.run \
--data-dir src/visualizations/data \
--output-root src/visualizations/results
--partialPlease use the --partial flag to omit certain graphs that aggregate over multiple datasets or models, since in the partial results we only have one combination.
Outputs are written to a timestamped folder:
src/visualizations/results/plotting_results_YYYYMMDD_HHMMSS/
This script only showcases the functionality to generate plots from raw results. Since our complete raw results exceed 500GB, and the preprocessed results exceed 10GB, we are working on a way to compress and share this in a cost-efficient way, but we are not able to do that fully and anonymously at this moment, which is why we provide a sample of results.