Related papers: Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency

Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency

URL: http://arxiv.org/abs/2401.07218v1
Date: Sun, 14 Jan 2024 07:16:52 GMT
Title: Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency
Authors: Junyu Zhu, Lina Liu, Bofeng Jiang, Feng Wen, Hongbo Zhang, Wanlong Li, Yong Liu
Abstract summary: We propose a self-supervised event-based monocular depth estimation framework named EMoDepth. EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate. In inference, only events are used for monocular depth prediction.
Score: 18.288912105820167
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An event camera is a novel vision sensor that can capture per-pixel brightness changes and output a stream of asynchronous ``events''. It has advantages over conventional cameras in those scenes with high-speed motions and challenging lighting conditions because of the high temporal resolution, high dynamic range, low bandwidth, low power consumption, and no motion blur. Therefore, several supervised monocular depth estimation from events is proposed to address scenes difficult for conventional cameras. However, depth annotation is costly and time-consuming. In this paper, to lower the annotation cost, we propose a self-supervised event-based monocular depth estimation framework named EMoDepth. EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate. Moreover, in inference, only events are used for monocular depth prediction. Additionally, we design a multi-scale skip-connection architecture to effectively fuse features for depth estimation while maintaining high inference speed. Experiments on MVSEC and DSEC datasets demonstrate that our contributions are effective and that the accuracy can outperform existing supervised event-based and unsupervised frame-based methods.

Related papers

Inter-event Interval Microscopy for Event Cameras [52.05337480169517]
Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity. We achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy. We have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios.
arXiv Detail & Related papers (2025-04-07T11:05:13Z)
Learning Monocular Depth from Events via Egomotion Compensation [20.388521240421948]
Event cameras are neuromorphically inspired sensors that sparsely and asynchronously report brightness changes. We propose an interpretable monocular depth estimation framework, where the likelihood of various depth hypotheses is explicitly determined by the effect of motion compensation. Our proposed framework outperforms cutting-edge methods by up to 10% in terms of the absolute relative error metric.
arXiv Detail & Related papers (2024-12-26T05:41:18Z)
Learning Monocular Depth from Focus with Event Focal Stack [6.200121342586474]
We propose the EDFF Network to estimate sparse depth from the Event Focal Stack. We use the event voxel grid to encode intensity change information and project event time surface into the depth domain. A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above.
arXiv Detail & Related papers (2024-05-11T07:54:49Z)
FEDORA: Flying Event Dataset fOr Reactive behAvior [9.470870778715689]
Event-based sensors have emerged as low latency and low energy alternatives to standard frame-based cameras for capturing high-speed motion. We present Flying Event dataset fOr Reactive behAviour (FEDORA) - a fully synthetic dataset for perception tasks.
arXiv Detail & Related papers (2023-05-22T22:59:05Z)
PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features [3.6355269783970394]
Event cameras are motion-activated sensors that capture pixel-level illumination changes instead of the intensity image with a fixed frame rate. We propose a robust, high-accurate, and real-time optimization-based monocular event-based visual-inertial odometry (VIO) method.
arXiv Detail & Related papers (2022-09-25T06:14:12Z)
Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera. Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range. In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z)
Globally-Optimal Event Camera Motion Estimation [30.79931004393174]
Event cameras are bio-inspired sensors that perform well in HDR conditions and have high temporal resolution. Event cameras measure asynchronous pixel-level changes and return them in a highly discretised format.
arXiv Detail & Related papers (2022-03-08T08:24:22Z)
Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range. We focus on event-based visual odometry (VO) We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z)
ESL: Event-based Structured Light [62.77144631509817]
Event cameras are bio-inspired sensors providing significant advantages over standard cameras. We propose a novel structured-light system using an event camera to tackle the problem of accurate and high-speed depth sensing.
arXiv Detail & Related papers (2021-11-30T15:47:39Z)
Event Guided Depth Sensing [50.997474285910734]
We present an efficient bio-inspired event-camera-driven depth estimation algorithm. In our approach, we illuminate areas of interest densely, depending on the scene activity detected by the event camera. We show the feasibility of our approach in a simulated autonomous driving sequences and real indoor environments.
arXiv Detail & Related papers (2021-10-20T11:41:11Z)
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors. Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction. We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z)
Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames. Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.