Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection
- URL: http://arxiv.org/abs/2301.09077v3
- Date: Thu, 19 Oct 2023 09:41:49 GMT
- Title: Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection
- Authors: Yifan Zhang, Qijian Zhang, Junhui Hou, Yixuan Yuan, and Guoliang Xing
- Abstract summary: We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
- Score: 67.94357336206136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To achieve reliable and precise scene understanding, autonomous vehicles
typically incorporate multiple sensing modalities to capitalize on their
complementary attributes. However, existing cross-modal 3D detectors do not
fully utilize the image domain information to address the bottleneck issues of
the LiDAR-based detectors. This paper presents a new cross-modal 3D object
detector, namely UPIDet, which aims to unleash the potential of the image
branch from two aspects. First, UPIDet introduces a new 2D auxiliary task
called normalized local coordinate map estimation. This approach enables the
learning of local spatial-aware features from the image modality to supplement
sparse point clouds. Second, we discover that the representational capability
of the point cloud backbone can be enhanced through the gradients
backpropagated from the training objectives of the image branch, utilizing a
succinct and effective point-to-pixel module. Extensive experiments and
ablation studies validate the effectiveness of our method. Notably, we achieved
the top rank in the highly competitive cyclist class of the KITTI benchmark at
the time of submission. The source code is available at
https://github.com/Eaphan/UPIDet.
Related papers
- OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D
Object Detection [17.526914782562528]
We propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign.
Our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results.
arXiv Detail & Related papers (2022-07-21T06:17:23Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - DetMatch: Two Teachers are Better Than One for Joint 2D and 3D
Semi-Supervised Object Detection [29.722784254501768]
DetMatch is a flexible framework for joint semi-supervised learning on 2D and 3D modalities.
By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels.
We leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes.
arXiv Detail & Related papers (2022-03-17T17:58:00Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.