EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency
- URL: http://arxiv.org/abs/2408.05452v1
- Date: Sat, 10 Aug 2024 06:13:37 GMT
- Title: EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency
- Authors: Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang,
- Abstract summary: Event cameras have the potential to revolutionize the field of robot vision.
We propose EV-MGDispNet, a novel event-based stereo disparity estimation method.
Our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.
- Score: 4.849111230195686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representations. Additionally, there is room for further reduction in pixel shifts in the feature maps before constructing the cost volume. In this paper, we propose EV-MGDispNet, a novel event-based stereo disparity estimation method. Firstly, we propose an edge-aware aggregation (EAA) module, which fuses event frames and motion confidence maps to generate a novel clear event representation. Then, we propose a motion-guided attention (MGA) module, where motion confidence maps utilize deformable transformer encoders to enhance the feature map with more accurate edges. Finally, we also add a census left-right consistency loss function to enhance the left-right consistency of stereo event representation. Through conducting experiments within challenging real-world driving scenarios, we validate that our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.
Related papers
- Spatio-temporal Transformers for Action Unit Classification with Event Cameras [28.98336123799572]
We present FACEMORPHIC, a temporally synchronized multimodal face dataset composed of RGB videos and event streams.
We show how temporal synchronization can allow effective neuromorphic face analysis without the need to manually annotate videos.
arXiv Detail & Related papers (2024-10-29T11:23:09Z) - EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting [76.02450110026747]
Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution.
We propose Event-Aided Free-Trajectory 3DGS, which seamlessly integrates the advantages of event cameras into 3DGS.
We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS.
arXiv Detail & Related papers (2024-10-20T13:44:24Z) - ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras [33.81592783496106]
Event-based visual odometry aims at solving tracking and mapping sub-problems in parallel.
We build an event-based stereo visual-inertial odometry system on top of our previous direct pipeline Event-based Stereo Visual Odometry.
arXiv Detail & Related papers (2024-10-12T05:35:27Z) - Temporal Event Stereo via Joint Learning with Stereoscopic Flow [44.479946706395694]
Event cameras are dynamic vision sensors inspired by the biological retina.
We propose a novel temporal event stereo framework.
We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets.
arXiv Detail & Related papers (2024-07-15T15:43:08Z) - IMU-Aided Event-based Stereo Visual Odometry [7.280676899773076]
We improve our previous direct pipeline textitEvent-based Stereo Visual Odometry in terms of accuracy and efficiency.
To speed up the mapping operation, we propose an efficient strategy of edge-pixel sampling according to the local dynamics of events.
We release our pipeline as an open-source software for future research in this field.
arXiv Detail & Related papers (2024-05-07T07:19:25Z) - Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig.
We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.