Robust 3D Object Detection from LiDAR-Radar Point Clouds via Cross-Modal
Feature Augmentation
- URL: http://arxiv.org/abs/2309.17336v3
- Date: Tue, 12 Mar 2024 09:24:29 GMT
- Title: Robust 3D Object Detection from LiDAR-Radar Point Clouds via Cross-Modal
Feature Augmentation
- Authors: Jianning Deng, Gabriel Chan, Hantao Zhong, and Chris Xiaoxuan Lu
- Abstract summary: This paper presents a novel framework for robust 3D object detection from point clouds via cross-modal hallucination.
We introduce multiple alignments on both spatial and feature levels to achieve simultaneous backbone refinement and hallucination generation.
Experiments on the View-of-Delft dataset show that our proposed method outperforms the state-of-the-art (SOTA) methods for both radar and LiDAR object detection.
- Score: 7.364627166256136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel framework for robust 3D object detection from
point clouds via cross-modal hallucination. Our proposed approach is agnostic
to either hallucination direction between LiDAR and 4D radar. We introduce
multiple alignments on both spatial and feature levels to achieve simultaneous
backbone refinement and hallucination generation. Specifically, spatial
alignment is proposed to deal with the geometry discrepancy for better instance
matching between LiDAR and radar. The feature alignment step further bridges
the intrinsic attribute gap between the sensing modalities and stabilizes the
training. The trained object detection models can deal with difficult detection
cases better, even though only single-modal data is used as the input during
the inference stage. Extensive experiments on the View-of-Delft (VoD) dataset
show that our proposed method outperforms the state-of-the-art (SOTA) methods
for both radar and LiDAR object detection while maintaining competitive
efficiency in runtime. Code is available at
https://github.com/DJNing/See_beyond_seeing.
Related papers
- Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene [22.297964850282177]
We propose LiDAR-2D Self-paced Learning (LiSe) for unsupervised 3D detection.
RGB images serve as a valuable complement to LiDAR data, offering precise 2D localization cues.
Our framework devises a self-paced learning pipeline that incorporates adaptive sampling and weak model aggregation strategies.
arXiv Detail & Related papers (2024-07-11T14:58:49Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - Fully Sparse Fusion for 3D Object Detection [69.32694845027927]
Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View feature maps.
Fully sparse architecture is gaining attention as they are highly efficient in long-range perception.
In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture.
arXiv Detail & Related papers (2023-04-24T17:57:43Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations.
We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions.
We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z) - Roadside Lidar Vehicle Detection and Tracking Using Range And Intensity
Background Subtraction [0.0]
We present the solution of roadside LiDAR object detection using a combination of two unsupervised learning algorithms.
The method was validated against a commercial traffic data collection platform.
arXiv Detail & Related papers (2022-01-13T00:54:43Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.