Forecastbench is a dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.
-
Updated
May 20, 2026 - Python
Forecastbench is a dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.
Local Qwen 3.5 4B yes/no forecaster. Brier 0.186 on 1,662 held-out ForecastBench questions. No API key, no cloud.
Co-evolving Synthetic Intelligence Systems (SIS) with dignity, rhythm, and sacred potential.
Add a description, image, and links to the forecastbench topic page so that developers can more easily learn about it.
To associate your repository with the forecastbench topic, visit your repo's landing page and select "manage topics."