Luca Bartolomei1,2,3 · Fabio Tosi2 · Matteo Poggi1,2 · Stefano Mattoccia1,2 · Guillermo Gallego3
1 Advanced Research Center on Electronic System (ARCES), University of Bologna, Italy
2 Department of Computer Science and Engineering (DISI), University of Bologna, Italy
3 TU Berlin, Robotics Institute, Einstein Center Digital Future, SCIoI Excellence Cluster, Germany
EventHub Overview. Our framework combines novel view synthesis for synthetic data generation with cross-modal distillation from RGB stereo foundation models to create high-quality training data for event-based stereo networks without ground-truth LiDAR annotations.
- 📋 Table of Contents
- 🎬 Introduction
- 📊 Method Overview
- ✨ Qualitative Results
- 📚 Citation
- 📧 Contact
- 🙏 Acknowledgements
We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis techniques, or simply proxy annotations when images are already paired with event data. Using the training set generated by our data factory, we repurpose state-of-the-art stereo models from RGB literature to process event data, obtaining new event stereo models with unprecedented generalization capabilities. Experiments on widely used event stereo datasets support the effectiveness of EventHub and show how the same data distillation mechanism can improve the accuracy of RGB stereo foundation models in challenging conditions such as nighttime scenes.
For datasets with only RGB images, we employ a novel pipeline leveraging SVRaster:
- Image Capture & Calibration: Multi-view RGB images with camera calibration via COLMAP
- Regularized Dense 3D Optimization: Fast training with normal consistency and Depth Anything V2 priors
- Virtual Trajectory Construction: Smooth camera trajectories exploring the reconstructed 3D scene
- Motion-Adaptive Stereo Rendering: Dynamic rendering framerate based on optical flow to generate high-quality event streams
When calibrated RGB-Event stereo pairs are available:
- Leverage pre-trained RGB stereo foundation models for depth estimation
- Reproject and align predictions to create proxy annotations for event data
- Eliminate the need for expensive LiDAR annotations
- Employ event representations such as Tencode compatible with RGB stereo architectures
- Fine-tune pre-trained RGB stereo networks on event domain data
Generalization to MVSEC Dataset. Zero-shot generalization demonstrating how models trained on EventHub data transfer effectively to unseen datasets with diverse motion patterns and camera setups.
Generalization to M3ED Dataset. Results on challenging scenarios including nighttime operation, dynamic objects, and rapid motion, showing impressive generalization capabilities.
🚧 Repository Status: This repository is under active development. We will be releasing:
- 📦 Pretrained models (Coming Soon)
- 💻 Training and evaluation code (Coming Soon)
- 📊 EventHub dataset and generated annotations (Coming Soon)
We appreciate your patience as we work towards release. Stay tuned for updates!
@InProceedings{Bartolomei_2026_CVPR,
author = {Bartolomei, Luca and Tosi, Fabio and Poggi, Matteo and Mattoccia, Stefano and Gallego, Guillermo},
title = {{EventHub}: Data Factory for Generalizable Event-Based Stereo Networks without Active Sensors},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}For questions or inquiries about EventHub, please contact:
- Luca Bartolomei: luca.bartolomei5@unibo.it
We would like to thank the authors of the following projects for making their code and models available:
- SVRaster for efficient novel view synthesis
- FoundationStereo for state-of-the-art RGB stereo matching
- StereoAnywhere for robust zero-shot depth estimation
- Depth Anything V2 for monocular depth priors
- COLMAP for structure-from-motion
- DSEC, MVSEC, and M3ED datasets for evaluation


