AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual
Vision Transformer
- URL: http://arxiv.org/abs/2402.07680v1
- Date: Mon, 12 Feb 2024 14:40:43 GMT
- Title: AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual
Vision Transformer
- Authors: Tanmoy Dam, Sanjay Bhargav Dharavath, Sameer Alam, Nimrod Lilith,
Supriyo Chakraborty and Mir Feroskhan
- Abstract summary: We introduce AYDIV, a novel framework integrating a tri-phase alignment process specifically designed to enhance long-distance detection.
AYDIV's performance on the Open dataset (WOD) with an improvement of 1.24% in mAPH value(L2 difficulty) and the Argoverse2 dataset with a performance improvement of 7.40% in AP value.
- Score: 5.287142970575824
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Combining LiDAR and camera data has shown potential in enhancing
short-distance object detection in autonomous driving systems. Yet, the fusion
encounters difficulties with extended distance detection due to the contrast
between LiDAR's sparse data and the dense resolution of cameras. Besides,
discrepancies in the two data representations further complicate fusion
methods. We introduce AYDIV, a novel framework integrating a tri-phase
alignment process specifically designed to enhance long-distance detection even
amidst data discrepancies. AYDIV consists of the Global Contextual Fusion
Alignment Transformer (GCFAT), which improves the extraction of camera features
and provides a deeper understanding of large-scale patterns; the Sparse Fused
Feature Attention (SFFA), which fine-tunes the fusion of LiDAR and camera
details; and the Volumetric Grid Attention (VGA) for a comprehensive spatial
data fusion. AYDIV's performance on the Waymo Open Dataset (WOD) with an
improvement of 1.24% in mAPH value(L2 difficulty) and the Argoverse2 Dataset
with a performance improvement of 7.40% in AP value demonstrates its efficacy
in comparison to other existing fusion-based methods. Our code is publicly
available at https://github.com/sanjay-810/AYDIV2
Related papers
- GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection [36.37236815038332]
We propose a novel multi-modality 3D objection detection method, named GAFusion, with LiDAR-guided global interaction and adaptive fusion.
GAFusion achieves state-of-the-art 3D object detection results with 73.6$%$ mAP and 74.9$%$ NDS on the nuScenes test set.
arXiv Detail & Related papers (2024-11-01T03:40:24Z) - Kaninfradet3D:A Road-side Camera-LiDAR Fusion 3D Perception Model based on Nonlinear Feature Extraction and Intrinsic Correlation [7.944126168010804]
With the development of AI-assisted driving, numerous methods have emerged for ego-vehicle 3D perception tasks.
With its ability to provide a global view and a broader sensing range, the roadside perspective is worth developing.
This paper proposes Kaninfradet3D, which optimize the feature extraction and fusion modules.
arXiv Detail & Related papers (2024-10-21T09:28:42Z) - Quantum Inverse Contextual Vision Transformers (Q-ICVT): A New Frontier in 3D Object Detection for AVs [4.378378863689719]
We develop an innovative two-stage fusion process called Quantum Inverse Contextual Vision Transformers (Q-ICVT)
This approach leverages adiabatic computing in quantum concepts to create a novel reversible vision transformer known as the Global Adiabatic Transformer (GAT)
Our experiments show that Q-ICVT achieves an mAPH of 82.54 for L2 difficulties on the dataset, improving by 1.88% over current state-of-the-art fusion methods.
arXiv Detail & Related papers (2024-08-20T21:36:57Z) - Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object
Detection [13.068266058374775]
We propose a novel camera-LiDAR fusion architecture called 3D Dual-Fusion.
The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention.
The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-11-24T11:00:50Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth
Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation.
We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.