GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates
- URL: http://arxiv.org/abs/2303.09800v1
- Date: Fri, 17 Mar 2023 07:05:04 GMT
- Title: GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates
- Authors: Bingqi Shen, Shuwei Dai, Yuyin Chen, Rong Xiong, Yue Wang, and Yanmei
Jiao
- Abstract summary: 3D object detection serves as the core basis of the perception tasks in autonomous driving.
Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models.
Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
- Score: 10.534984939225014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection serves as the core basis of the perception tasks in
autonomous driving. Recent years have seen the rapid progress of multi-modal
fusion strategies for more robust and accurate 3D object detection. However,
current researches for robust fusion are all learning-based frameworks, which
demand a large amount of training data and are inconvenient to implement in new
scenes. In this paper, we propose GOOD, a general optimization-based fusion
framework that can achieve satisfying detection without training additional
models and is available for any combinations of 2D and 3D detectors to improve
the accuracy and robustness of 3D detection. First we apply the mutual-sided
nearest-neighbor probability model to achieve the 3D-2D data association. Then
we design an optimization pipeline that can optimize different kinds of
instances separately based on the matching result. Apart from this, the 3D MOT
method is also introduced to enhance the performance aided by previous frames.
To the best of our knowledge, this is the first optimization-based late fusion
framework for multi-modal 3D object detection which can be served as a baseline
for subsequent research. Experiments on both nuScenes and KITTI datasets are
carried out and the results show that GOOD outperforms by 9.1\% on mAP score
compared with PointPillars and achieves competitive results with the
learning-based late fusion CLOCs.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection [11.575945934519442]
LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving.
Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds.
We propose a general multi-modal fusion framework Multi-Sem Fusion (MSF) to fuse the semantic information from both the 2D image and 3D points scene parsing results.
arXiv Detail & Related papers (2022-12-10T10:54:41Z) - AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D
Object Detection [17.526914782562528]
We propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign.
Our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results.
arXiv Detail & Related papers (2022-07-21T06:17:23Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations.
We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions.
We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z) - Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for
Autonomous Driving [6.396288020763144]
Multi-object tracking (MOT) with camera-LiDAR fusion demands accurate results of object detection, affinity computation and data association in real time.
This paper presents an efficient multi-modal MOT framework with online joint detection and tracking schemes and robust data association for autonomous driving applications.
arXiv Detail & Related papers (2021-08-10T11:17:05Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection [13.986963122264633]
We propose a novel Camera-LiDAR Object Candidates (CLOCs) fusion network.
CLOCs fusion provides a low-complexity multi-modal fusion framework.
We show that CLOCs ranks the highest among all the fusion-based methods in the official KITTI leaderboard.
arXiv Detail & Related papers (2020-09-02T02:07:00Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Improving 3D Object Detection through Progressive Population Based
Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection.
We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations.
We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.