Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D
Object Detection
- URL: http://arxiv.org/abs/2403.07372v1
- Date: Tue, 12 Mar 2024 07:16:20 GMT
- Title: Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D
Object Detection
- Authors: Jiahui Fu, Chen Gao, Zitian Wang, Lirong Yang, Xiaofei Wang, Beipeng
Mu, Si Liu
- Abstract summary: Recent 3D object detectors typically utilize multi-sensor data and unify multi-modal features in the shared bird's-eye view (BEV) representation space.
Previous methods have limitations in generating fusion BEV features free from cross-modal conflicts.
We propose a novel Eliminating Conflicts Fusion (ECFusion) method to explicitly eliminate the extrinsic/inherent conflicts in BEV space.
- Score: 26.75994759483174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent 3D object detectors typically utilize multi-sensor data and unify
multi-modal features in the shared bird's-eye view (BEV) representation space.
However, our empirical findings indicate that previous methods have limitations
in generating fusion BEV features free from cross-modal conflicts. These
conflicts encompass extrinsic conflicts caused by BEV feature construction and
inherent conflicts stemming from heterogeneous sensor signals. Therefore, we
propose a novel Eliminating Conflicts Fusion (ECFusion) method to explicitly
eliminate the extrinsic/inherent conflicts in BEV space and produce improved
multi-modal BEV features. Specifically, we devise a Semantic-guided Flow-based
Alignment (SFA) module to resolve extrinsic conflicts via unifying spatial
distribution in BEV space before fusion. Moreover, we design a Dissolved Query
Recovering (DQR) mechanism to remedy inherent conflicts by preserving
objectness clues that are lost in the fusion BEV feature. In general, our
method maximizes the effective information utilization of each modality and
leverages inter-modal complementarity. Our method achieves state-of-the-art
performance in the highly competitive nuScenes 3D object detection dataset. The
code is released at https://github.com/fjhzhixi/ECFusion.
Related papers
- ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection [21.05923528672353]
We propose a novel ContrastAlign approach to enhance the alignment of heterogeneous modalities.
Our method achieves state-of-the-art performance, with an mAP of 70.3%, surpassing BEVFusion by 1.8% on the nuScenes validation set.
arXiv Detail & Related papers (2024-05-27T06:43:12Z) - IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection [130.394884412296]
We propose IS-Fusion, an innovative multimodal fusion framework.
It captures the Instance- and Scene-level contextual information.
Is-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion.
arXiv Detail & Related papers (2024-03-22T14:34:17Z) - UniMODE: Unified Monocular 3D Object Detection [70.27631528933482]
We build a detector based on the bird's-eye-view (BEV) detection paradigm.
We propose an uneven BEV grid design to handle the convergence instability caused by the challenges.
A unified detector UniMODE is derived, which surpasses the previous state-of-the-art on the challenging Omni3D dataset.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities [7.470926069132259]
We propose an end-to-end multi-modal 3D object detection framework designed for robustness against missing modalities.
UniBEV can operate on LiDAR plus camera input, but also on LiDAR-only or camera-only input without retraining.
We compare UniBEV to state-of-the-art BEVFusion and MetaBEV on nuScenes over all sensor input combinations.
arXiv Detail & Related papers (2023-09-25T20:22:47Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.