Self-supervised Event-based Monocular Depth Estimation using Cross-modal
Consistency
- URL: http://arxiv.org/abs/2401.07218v1
- Date: Sun, 14 Jan 2024 07:16:52 GMT
- Title: Self-supervised Event-based Monocular Depth Estimation using Cross-modal
Consistency
- Authors: Junyu Zhu, Lina Liu, Bofeng Jiang, Feng Wen, Hongbo Zhang, Wanlong Li,
Yong Liu
- Abstract summary: We propose a self-supervised event-based monocular depth estimation framework named EMoDepth.
EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate.
In inference, only events are used for monocular depth prediction.
- Score: 18.288912105820167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An event camera is a novel vision sensor that can capture per-pixel
brightness changes and output a stream of asynchronous ``events''. It has
advantages over conventional cameras in those scenes with high-speed motions
and challenging lighting conditions because of the high temporal resolution,
high dynamic range, low bandwidth, low power consumption, and no motion blur.
Therefore, several supervised monocular depth estimation from events is
proposed to address scenes difficult for conventional cameras. However, depth
annotation is costly and time-consuming. In this paper, to lower the annotation
cost, we propose a self-supervised event-based monocular depth estimation
framework named EMoDepth. EMoDepth constrains the training process using the
cross-modal consistency from intensity frames that are aligned with events in
the pixel coordinate. Moreover, in inference, only events are used for
monocular depth prediction. Additionally, we design a multi-scale
skip-connection architecture to effectively fuse features for depth estimation
while maintaining high inference speed. Experiments on MVSEC and DSEC datasets
demonstrate that our contributions are effective and that the accuracy can
outperform existing supervised event-based and unsupervised frame-based
methods.
Related papers
- Learning Monocular Depth from Focus with Event Focal Stack [6.200121342586474]
We propose the EDFF Network to estimate sparse depth from the Event Focal Stack.
We use the event voxel grid to encode intensity change information and project event time surface into the depth domain.
A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above.
arXiv Detail & Related papers (2024-05-11T07:54:49Z) - FEDORA: Flying Event Dataset fOr Reactive behAvior [9.470870778715689]
Event-based sensors have emerged as low latency and low energy alternatives to standard frame-based cameras for capturing high-speed motion.
We present Flying Event dataset fOr Reactive behAviour (FEDORA) - a fully synthetic dataset for perception tasks.
arXiv Detail & Related papers (2023-05-22T22:59:05Z) - PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with
Point and Line Features [3.6355269783970394]
Event cameras are motion-activated sensors that capture pixel-level illumination changes instead of the intensity image with a fixed frame rate.
We propose a robust, high-accurate, and real-time optimization-based monocular event-based visual-inertial odometry (VIO) method.
arXiv Detail & Related papers (2022-09-25T06:14:12Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - Globally-Optimal Event Camera Motion Estimation [30.79931004393174]
Event cameras are bio-inspired sensors that perform well in HDR conditions and have high temporal resolution.
Event cameras measure asynchronous pixel-level changes and return them in a highly discretised format.
arXiv Detail & Related papers (2022-03-08T08:24:22Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - ESL: Event-based Structured Light [62.77144631509817]
Event cameras are bio-inspired sensors providing significant advantages over standard cameras.
We propose a novel structured-light system using an event camera to tackle the problem of accurate and high-speed depth sensing.
arXiv Detail & Related papers (2021-11-30T15:47:39Z) - Event Guided Depth Sensing [50.997474285910734]
We present an efficient bio-inspired event-camera-driven depth estimation algorithm.
In our approach, we illuminate areas of interest densely, depending on the scene activity detected by the event camera.
We show the feasibility of our approach in a simulated autonomous driving sequences and real indoor environments.
arXiv Detail & Related papers (2021-10-20T11:41:11Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.