Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle
Perception in Bird's-Eye-View
- URL: http://arxiv.org/abs/2402.00637v1
- Date: Thu, 1 Feb 2024 14:52:16 GMT
- Title: Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle
Perception in Bird's-Eye-View
- Authors: Arindam Das, Sudarshan Paul, Niko Scholz, Akhilesh Kumar Malviya,
Ganesh Sistu, Ujjwal Bhattacharya, and Ciar\'an Eising
- Abstract summary: We present the first end-to-end multimodal fusion model tailored for efficient obstacle perception in a bird's-eye-view (BEV) perspective.
Fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization.
However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when subjected to intense sun glare.
- Score: 4.536942273206611
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate obstacle identification represents a fundamental challenge within
the scope of near-field perception for autonomous driving. Conventionally,
fisheye cameras are frequently employed for comprehensive surround-view
perception, including rear-view obstacle localization. However, the performance
of such cameras can significantly deteriorate in low-light conditions, during
nighttime, or when subjected to intense sun glare. Conversely, cost-effective
sensors like ultrasonic sensors remain largely unaffected under these
conditions. Therefore, we present, to our knowledge, the first end-to-end
multimodal fusion model tailored for efficient obstacle perception in a
bird's-eye-view (BEV) perspective, utilizing fisheye cameras and ultrasonic
sensors. Initially, ResNeXt-50 is employed as a set of unimodal encoders to
extract features specific to each modality. Subsequently, the feature space
associated with the visible spectrum undergoes transformation into BEV. The
fusion of these two modalities is facilitated via concatenation. At the same
time, the ultrasonic spectrum-based unimodal feature maps pass through
content-aware dilated convolution, applied to mitigate the sensor misalignment
between two sensors in the fused feature space. Finally, the fused features are
utilized by a two-stage semantic occupancy decoder to generate grid-wise
predictions for precise obstacle perception. We conduct a systematic
investigation to determine the optimal strategy for multimodal fusion of both
sensors. We provide insights into our dataset creation procedures, annotation
guidelines, and perform a thorough data analysis to ensure adequate coverage of
all scenarios. When applied to our dataset, the experimental results underscore
the robustness and effectiveness of our proposed multimodal fusion approach.
Related papers
- MAROON: A Framework for the Joint Characterization of Near-Field High-Resolution Radar and Optical Depth Imaging Techniques [4.816237933371206]
We take on the unique challenge of characterizing depth imagers from both, the optical and radio-frequency domain.
We provide a comprehensive evaluation of their depth measurements with respect to distinct object materials, geometries, and object-to-sensor distances.
All object measurements will be made public in form of a multimodal dataset, called MAROON.
arXiv Detail & Related papers (2024-11-01T11:53:10Z) - Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a
Light-Weight ToF Sensor [58.305341034419136]
We present the first dense SLAM system with a monocular camera and a light-weight ToF sensor.
We propose a multi-modal implicit scene representation that supports rendering both the signals from the RGB camera and light-weight ToF sensor.
Experiments demonstrate that our system well exploits the signals of light-weight ToF sensors and achieves competitive results.
arXiv Detail & Related papers (2023-08-28T07:56:13Z) - DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras
and Radars [2.2166853714891057]
We propose a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection.
Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible.
Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach.
arXiv Detail & Related papers (2022-09-26T14:33:30Z) - Drone Detection and Tracking in Real-Time by Fusion of Different Sensing
Modalities [66.4525391417921]
We design and evaluate a multi-sensor drone detection system.
Our solution integrates a fish-eye camera as well to monitor a wider part of the sky and steer the other cameras towards objects of interest.
The thermal camera is shown to be a feasible solution as good as the video camera, even if the camera employed here has a lower resolution.
arXiv Detail & Related papers (2022-07-05T10:00:58Z) - On Learning the Invisible in Photoacoustic Tomography with Flat
Directionally Sensitive Detector [0.27074235008521236]
In this paper, we focus on the second type caused by a varying sensitivity of the sensor to the incoming wavefront direction.
The visible ranges, in image and data domains, are related by the wavefront direction mapping.
We optimally combine fast approximate operators with tailored deep neural network architectures into efficient learned reconstruction methods.
arXiv Detail & Related papers (2022-04-21T09:57:01Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - GenRadar: Self-supervised Probabilistic Camera Synthesis based on Radar
Frequencies [12.707035083920227]
This work combines the complementary strengths of both sensor types in a unique self-learning fusion approach for a probabilistic scene reconstruction.
A proposed algorithm exploits similarities and establishes correspondences between both domains at different feature levels during training.
These discrete tokens are finally transformed back into an instructive view of the respective surrounding, allowing to visually perceive potential dangers.
arXiv Detail & Related papers (2021-07-19T15:00:28Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.