EvtSlowTV - A Large and Diverse Dataset for Event-Based Depth Estimation
- URL: http://arxiv.org/abs/2511.02953v1
- Date: Tue, 04 Nov 2025 19:56:26 GMT
- Title: EvtSlowTV - A Large and Diverse Dataset for Event-Based Depth Estimation
- Authors: Sadiq Layi Macaulay, Nimet Kaygusuz, Simon Hadfield,
- Abstract summary: EvtSlowTV is a large-scale event camera dataset curated from publicly available YouTube footage.<n>This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams.
- Score: 9.039910084106921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras, with their high dynamic range (HDR) and low latency, offer a promising alternative for robust depth estimation in challenging environments. However, many event-based depth estimation approaches are constrained by small-scale annotated datasets, limiting their generalizability to real-world scenarios. To bridge this gap, we introduce EvtSlowTV, a large-scale event camera dataset curated from publicly available YouTube footage, which contains more than 13B events across various environmental conditions and motions, including seasonal hiking, flying, scenic driving, and underwater exploration. EvtSlowTV is an order of magnitude larger than existing event datasets, providing an unconstrained, naturalistic setting for event-based depth learning. This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams. We further demonstrate that training with EvtSlowTV enhances the model's ability to generalise to complex scenes and motions. Our approach removes the need for frame-based annotations and preserves the asynchronous nature of event data.
Related papers
- EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models [56.16721798968254]
We propose an event-guided, training-free framework for efficient understanding, named EventSTU.<n>In the temporal domain, we design a coarse-to-fine sampling algorithm that the change-triggered property of event cameras to eliminate redundant large frames.<n>In the spatial domain, we achieves an adaptive token pruning algorithm that leverages the saliency of events as a zero-cost prior to guide spatial reduction.
arXiv Detail & Related papers (2025-11-24T09:30:02Z) - Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation [47.90167568304715]
Event cameras capture sparse, high-temporal-resolution visual information.<n>The lack of large datasets with dense ground-truth depth annotations hinders learning-based monocular depth estimation from event data.<n>We propose a cross-modal distillation paradigm to generate dense proxy labels leveraging a Vision Foundation Model (VFM)<n>Our strategy requires an event stream spatially aligned with RGB frames, a simple setup even available off-the-shelf, and exploits the robustness of large-scale VFMs.
arXiv Detail & Related papers (2025-09-18T17:59:51Z) - ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving [62.9051914830949]
We present ROVR, a large-scale, diverse, and cost-efficient depth dataset designed to capture the complexity of real-world driving.<n>A lightweight acquisition pipeline ensures scalable collection, while sparse but statistically sufficient ground truth supports robust training.<n> Benchmarking with state-of-the-art monocular depth models reveals severe cross-dataset generalization failures.
arXiv Detail & Related papers (2025-08-19T16:13:49Z) - EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More [7.974102031202597]
EvLight++ is a novel event-guided low-light video enhancement approach designed for robust performance in real-world scenarios.
EvLight++ significantly outperforms both single image- and video-based methods by 1.37 dB and 3.71 dB, respectively.
arXiv Detail & Related papers (2024-08-29T04:30:31Z) - Self-supervised Event-based Monocular Depth Estimation using Cross-modal
Consistency [18.288912105820167]
We propose a self-supervised event-based monocular depth estimation framework named EMoDepth.
EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate.
In inference, only events are used for monocular depth prediction.
arXiv Detail & Related papers (2024-01-14T07:16:52Z) - An Event-Oriented Diffusion-Refinement Method for Sparse Events
Completion [36.64856578682197]
Event cameras or dynamic vision sensors (DVS) record asynchronous response to brightness changes instead of conventional intensity frames.
We propose an inventive event completion sequence approach conforming to unique characteristics of event data in both the processing stage and the output form.
Specifically, we treat event streams as 3D event clouds in thetemporal domain, develop a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully.
arXiv Detail & Related papers (2024-01-06T08:09:54Z) - EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields [80.94515892378053]
EvDNeRF is a pipeline for generating event data and training an event-based dynamic NeRF.
NeRFs offer geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes.
We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions.
arXiv Detail & Related papers (2023-10-03T21:08:41Z) - On the Generation of a Synthetic Event-Based Vision Dataset for
Navigation and Landing [69.34740063574921]
This paper presents a methodology for generating event-based vision datasets from optimal landing trajectories.
We construct sequences of photorealistic images of the lunar surface with the Planet and Asteroid Natural Scene Generation Utility.
We demonstrate that the pipeline can generate realistic event-based representations of surface features by constructing a dataset of 500 trajectories.
arXiv Detail & Related papers (2023-08-01T09:14:20Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.