Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement
- URL: http://arxiv.org/abs/2408.07999v1
- Date: Thu, 15 Aug 2024 07:56:02 GMT
- Title: Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement
- Authors: Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen,
- Abstract summary: Co-Fix3D employs a collaborative hybrid multi-stage parallel query generation mechanism for BEV representations.
Our method incorporates the Local-Global Feature Enhancement (LGE) module, which refines BEV features to more effectively highlight weak positive samples.
Co-Fix3D achieves superior results on the stringent nuScenes benchmark.
- Score: 33.773644087620745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate this issue, we propose a novel approach, Co-Fix3D, which employs a collaborative hybrid multi-stage parallel query generation mechanism for BEV representations. Our method incorporates the Local-Global Feature Enhancement (LGE) module, which refines BEV features to more effectively highlight weak positive samples. It uniquely leverages the Discrete Wavelet Transform (DWT) for accurate noise reduction and features refinement in localized areas, and incorporates an attention mechanism to more comprehensively optimize global BEV features. Moreover, our method increases the volume of BEV queries through a multi-stage parallel processing of the LGE, significantly enhancing the probability of selecting weak positive samples. This enhancement not only improves training efficiency within the decoder framework but also boosts overall system performance. Notably, Co-Fix3D achieves superior results on the stringent nuScenes benchmark, outperforming all previous models with a 69.1% mAP and 72.9% NDS on the LiDAR-based benchmark, and 72.3% mAP and 74.1% NDS on the multi-modality benchmark, without relying on test-time augmentation or additional datasets. The source code will be made publicly available upon acceptance.
Related papers
- AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features [13.48200434855076]
AuxDepthNet is an efficient framework for real-time monocular 3D object detection.
It eliminates the reliance on external depth maps or pre-trained depth models.
It achieves state-of-the-art performance, with scores of 34.11% (Easy), 25.18% (Moderate), and 21.90% (Hard) at an IoU threshold of 0.7.
arXiv Detail & Related papers (2025-01-07T11:07:32Z) - GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection [36.37236815038332]
We propose a novel multi-modality 3D objection detection method, named GAFusion, with LiDAR-guided global interaction and adaptive fusion.
GAFusion achieves state-of-the-art 3D object detection results with 73.6$%$ mAP and 74.9$%$ NDS on the nuScenes test set.
arXiv Detail & Related papers (2024-11-01T03:40:24Z) - SSDA3D: Semi-supervised Domain Adaptation for 3D Object Detection from
Point Cloud [125.9472454212909]
We present a novel Semi-Supervised Domain Adaptation method for 3D object detection (SSDA3D)
SSDA3D includes an Inter-domain Adaptation stage and an Intra-domain Generalization stage.
Experiments show that, with only 10% labeled target data, our SSDA3D can surpass the fully-supervised oracle model with 100% target label.
arXiv Detail & Related papers (2022-12-06T09:32:44Z) - 3DLG-Detector: 3D Object Detection via Simultaneous Local-Global Feature
Learning [15.995277437128452]
Capturing both local and global features of irregular point clouds is essential to 3D object detection (3OD)
This paper explores new modules to simultaneously learn local-global features of scene point clouds that serve 3OD positively.
We propose an effective 3OD network via simultaneous local-global feature learning (dubbed 3DLG-Detector)
arXiv Detail & Related papers (2022-08-31T12:23:40Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.