Related papers: Learning Monocular Dense Depth from Events

Learning Monocular Dense Depth from Events

URL: http://arxiv.org/abs/2010.08350v2
Date: Thu, 22 Oct 2020 08:33:43 GMT
Title: Learning Monocular Dense Depth from Events
Authors: Javier Hidalgo-Carri\'o, Daniel Gehrig and Davide Scaramuzza
Abstract summary: Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames. Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
Score: 53.078665310545745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous events instead of intensity frames. Compared to conventional image sensors, they offer significant advantages: high temporal resolution, high dynamic range, no motion blur, and much lower bandwidth. Recently, learning-based approaches have been applied to event-based data, thus unlocking their potential and making significant progress in a variety of tasks, such as monocular depth prediction. Most existing approaches use standard feed-forward architectures to generate network predictions, which do not leverage the temporal consistency presents in the event stream. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods. In particular, our method generates dense depth predictions using a monocular setup, which has not been shown previously. We pretrain our model using a new dataset containing events and depth maps recorded in the CARLA simulator. We test our method on the Multi Vehicle Stereo Event Camera Dataset (MVSEC). Quantitative experiments show up to 50% improvement in average depth error with respect to previous event-based methods.

Related papers

DERD-Net: Learning Depth from Event-based Ray Densities [11.309936820480111]
Event cameras offer a promising avenue for multi-view stereo depth estimation and SLAM. We propose a scalable, flexible and adaptable framework for pixel-wise depth estimation with event cameras in both monocular and stereo setups.
arXiv Detail & Related papers (2025-04-22T12:58:05Z)
Temporal-Mapping Photography for Event Cameras [5.344756442054121]
Event cameras, or Dynamic Vision Sensors (DVS), capture brightness changes as a continuous stream of "events" Converting sparse events to dense intensity frames faithfully has long been an ill-posed problem. In this paper, for the first time, we realize events to dense intensity image conversion using a stationary event camera in static scenes.
arXiv Detail & Related papers (2024-03-11T05:29:46Z)
Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency [18.288912105820167]
We propose a self-supervised event-based monocular depth estimation framework named EMoDepth. EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate. In inference, only events are used for monocular depth prediction.
arXiv Detail & Related papers (2024-01-14T07:16:52Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
Secrets of Event-Based Optical Flow [13.298845944779108]
Event cameras respond to scene dynamics and offer advantages to estimate motion. We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone. Our method ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark.
arXiv Detail & Related papers (2022-07-20T16:40:38Z)
DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement. The architecture incorporates LSTM units to propagate information through each refinement step. DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z)
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors. Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction. We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z)
EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream [80.15360180192175]
3D hand pose estimation from monocular videos is a long-standing and challenging problem. We address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes. Our approach has characteristics previously not demonstrated with a single RGB or depth camera.
arXiv Detail & Related papers (2020-12-11T16:45:34Z)
Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range. Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z)
Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal. We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.