Learning Monocular Dense Depth from Events
- URL: http://arxiv.org/abs/2010.08350v2
- Date: Thu, 22 Oct 2020 08:33:43 GMT
- Title: Learning Monocular Dense Depth from Events
- Authors: Javier Hidalgo-Carri\'o, Daniel Gehrig and Davide Scaramuzza
- Abstract summary: Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
- Score: 53.078665310545745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras are novel sensors that output brightness changes in the form of
a stream of asynchronous events instead of intensity frames. Compared to
conventional image sensors, they offer significant advantages: high temporal
resolution, high dynamic range, no motion blur, and much lower bandwidth.
Recently, learning-based approaches have been applied to event-based data, thus
unlocking their potential and making significant progress in a variety of
tasks, such as monocular depth prediction. Most existing approaches use
standard feed-forward architectures to generate network predictions, which do
not leverage the temporal consistency presents in the event stream. We propose
a recurrent architecture to solve this task and show significant improvement
over standard feed-forward methods. In particular, our method generates dense
depth predictions using a monocular setup, which has not been shown previously.
We pretrain our model using a new dataset containing events and depth maps
recorded in the CARLA simulator. We test our method on the Multi Vehicle Stereo
Event Camera Dataset (MVSEC). Quantitative experiments show up to 50%
improvement in average depth error with respect to previous event-based
methods.
Related papers
- Temporal-Mapping Photography for Event Cameras [5.344756442054121]
Event cameras, or Dynamic Vision Sensors (DVS), capture brightness changes as a continuous stream of "events"
Converting sparse events to dense intensity frames faithfully has long been an ill-posed problem.
In this paper, for the first time, we realize events to dense intensity image conversion using a stationary event camera in static scenes.
arXiv Detail & Related papers (2024-03-11T05:29:46Z) - Self-supervised Event-based Monocular Depth Estimation using Cross-modal
Consistency [18.288912105820167]
We propose a self-supervised event-based monocular depth estimation framework named EMoDepth.
EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate.
In inference, only events are used for monocular depth prediction.
arXiv Detail & Related papers (2024-01-14T07:16:52Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Secrets of Event-Based Optical Flow [13.298845944779108]
Event cameras respond to scene dynamics and offer advantages to estimate motion.
We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone.
Our method ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark.
arXiv Detail & Related papers (2022-07-20T16:40:38Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream [80.15360180192175]
3D hand pose estimation from monocular videos is a long-standing and challenging problem.
We address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes.
Our approach has characteristics previously not demonstrated with a single RGB or depth camera.
arXiv Detail & Related papers (2020-12-11T16:45:34Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.