CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving
- URL: http://arxiv.org/abs/2408.08500v1
- Date: Fri, 16 Aug 2024 02:55:10 GMT
- Title: CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving
- Authors: Shihan Peng, Hanyu Zhou, Hao Dong, Zhiwei Shi, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan,
- Abstract summary: Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion.
We propose a coaxial stereo event camera (CoSEC) dataset for autonomous driving.
- Score: 15.611896480837316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional frame camera is the mainstream sensor of the autonomous driving scene perception, while it is limited in adverse conditions, such as low light. Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion, which relies heavily on the pixel-level spatial alignment between various modalities. Typically, existing multimodal datasets mainly place event and frame cameras in parallel and directly align them spatially via warping operation. However, this parallel strategy is less effective for multimodal fusion, since the large disparity exacerbates spatial misalignment due to the large event-frame baseline. We argue that baseline minimization can reduce alignment error between event and frame cameras. In this work, we introduce hybrid coaxial event-frame devices to build the multimodal system, and propose a coaxial stereo event camera (CoSEC) dataset for autonomous driving. As for the multimodal system, we first utilize the microcontroller to achieve time synchronization, and then spatially calibrate different sensors, where we perform intra- and inter-calibration of stereo coaxial devices. As for the multimodal dataset, we filter LiDAR point clouds to generate depth and optical flow labels using reference depth, which is further improved by fusing aligned event and frame data in nighttime conditions. With the help of the coaxial device, the proposed dataset can promote the all-day pixel-level multimodal fusion. Moreover, we also conduct experiments to demonstrate that the proposed dataset can improve the performance and generalization of the multimodal fusion.
Related papers
- X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios [105.16073169351299]
We propose a novel framework, X-DRIVE, to model the joint distribution of point clouds and multi-view images.
Considering the distinct geometrical spaces of the two modalities, X-DRIVE conditions the synthesis of each modality on the corresponding local regions.
X-DRIVE allows for controllable generation through multi-level input conditions, including text, bounding box, image, and point clouds.
arXiv Detail & Related papers (2024-11-02T03:52:12Z) - Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - An Asynchronous Linear Filter Architecture for Hybrid Event-Frame Cameras [9.69495347826584]
We present an asynchronous linear filter architecture, fusing event and frame camera data, for HDR video reconstruction and spatial convolution.
The proposed AKF pipeline outperforms other state-of-the-art methods in both absolute intensity error (69.4% reduction) and image similarity indexes (average 35.5% improvement)
arXiv Detail & Related papers (2023-09-03T12:37:59Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Frame-Event Alignment and Fusion Network for High Frame Rate Tracking [37.35823883499189]
Most existing RGB-based trackers target low frame rate benchmarks of around 30 frames per second.
We propose an end-to-end network consisting of multi-modality alignment and fusion modules.
With the FE240hz dataset, our approach achieves high frame rate tracking up to 240Hz.
arXiv Detail & Related papers (2023-05-25T03:34:24Z) - Self-Supervised Intensity-Event Stereo Matching [24.851819610561517]
Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy.
Event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously.
This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors.
arXiv Detail & Related papers (2022-11-01T14:52:25Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception [17.585862399941544]
Event cameras address limitations as they report brightness changes of each pixel independently with a fine temporal resolution.
integrated hybrid event-frame sensors (eg., DAVIS) are available, but the quality of data is compromised by coupling at the pixel level in the circuit fabrication of such cameras.
This paper proposes a stereo hybrid event-frame (SHEF) camera system that offers a sensor modality with separate high-quality pure event and pure frame cameras.
arXiv Detail & Related papers (2021-10-11T04:03:36Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - VisEvent: Reliable Object Tracking via Collaboration of Frame and Event
Flows [93.54888104118822]
We propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task.
Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios.
Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods.
arXiv Detail & Related papers (2021-08-11T03:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.