Self-Supervised Intensity-Event Stereo Matching
- URL: http://arxiv.org/abs/2211.00509v1
- Date: Tue, 1 Nov 2022 14:52:25 GMT
- Title: Self-Supervised Intensity-Event Stereo Matching
- Authors: Jinjin Gu, Jinan Zhou, Ringo Sai Wo Chu, Yan Chen, Jiawei Zhang,
Xuanye Cheng, Song Zhang, Jimmy S. Ren
- Abstract summary: Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy.
Event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously.
This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors.
- Score: 24.851819610561517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras are novel bio-inspired vision sensors that output pixel-level
intensity changes in microsecond accuracy with a high dynamic range and low
power consumption. Despite these advantages, event cameras cannot be directly
applied to computational imaging tasks due to the inability to obtain
high-quality intensity and events simultaneously. This paper aims to connect a
standalone event camera and a modern intensity camera so that the applications
can take advantage of both two sensors. We establish this connection through a
multi-modal stereo matching task. We first convert events to a reconstructed
image and extend the existing stereo networks to this multi-modality condition.
We propose a self-supervised method to train the multi-modal stereo network
without using ground truth disparity data. The structure loss calculated on
image gradients is used to enable self-supervised learning on such multi-modal
data. Exploiting the internal stereo constraint between views with different
modalities, we introduce general stereo loss functions, including disparity
cross-consistency loss and internal disparity loss, leading to improved
performance and robustness compared to existing approaches. The experiments
demonstrate the effectiveness of the proposed method, especially the proposed
general stereo loss functions, on both synthetic and real datasets. At last, we
shed light on employing the aligned events and intensity images in downstream
tasks, e.g., video interpolation application.
Related papers
- WTCL-Dehaze: Rethinking Real-world Image Dehazing via Wavelet Transform and Contrastive Learning [17.129068060454255]
Single image dehazing is essential for applications such as autonomous driving and surveillance.
We propose an enhanced semi-supervised dehazing network that integrates Contrastive Loss and Discrete Wavelet Transform.
Our proposed algorithm achieves superior performance and improved robustness compared to state-of-the-art single image dehazing methods.
arXiv Detail & Related papers (2024-10-07T05:36:11Z) - Temporal Event Stereo via Joint Learning with Stereoscopic Flow [44.479946706395694]
Event cameras are dynamic vision sensors inspired by the biological retina.
We propose a novel temporal event stereo framework.
We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets.
arXiv Detail & Related papers (2024-07-15T15:43:08Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - Recovering Continuous Scene Dynamics from A Single Blurry Image with
Events [58.7185835546638]
An Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events.
A dual attention transformer is proposed to efficiently leverage merits from both modalities.
The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps.
arXiv Detail & Related papers (2023-04-05T18:44:17Z) - Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception [17.585862399941544]
Event cameras address limitations as they report brightness changes of each pixel independently with a fine temporal resolution.
integrated hybrid event-frame sensors (eg., DAVIS) are available, but the quality of data is compromised by coupling at the pixel level in the circuit fabrication of such cameras.
This paper proposes a stereo hybrid event-frame (SHEF) camera system that offers a sensor modality with separate high-quality pure event and pure frame cameras.
arXiv Detail & Related papers (2021-10-11T04:03:36Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z) - Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig.
We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.