Multimodal Transformer for Automatic 3D Annotation and Object Detection
- URL: http://arxiv.org/abs/2207.09805v1
- Date: Wed, 20 Jul 2022 10:38:29 GMT
- Title: Multimodal Transformer for Automatic 3D Annotation and Object Detection
- Authors: Chang Liu, Xiaoyan Qian, Binxiao Huang, Xiaojuan Qi, Edmund Lam,
Siew-Chong Tan, Ngai Wong
- Abstract summary: We propose an end-to-end multimodal transformer (MTrans) autolabeler to generate precise 3D box annotations from weak 2D bounding boxes.
With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously.
By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively.
- Score: 27.92241487946078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite a growing number of datasets being collected for training 3D object
detection models, significant human effort is still required to annotate 3D
boxes on LiDAR scans. To automate the annotation and facilitate the production
of various customized datasets, we propose an end-to-end multimodal transformer
(MTrans) autolabeler, which leverages both LiDAR scans and images to generate
precise 3D box annotations from weak 2D bounding boxes. To alleviate the
pervasive sparsity problem that hinders existing autolabelers, MTrans densifies
the sparse point clouds by generating new 3D points based on 2D image
information. With a multi-task design, MTrans segments the
foreground/background, densifies LiDAR point clouds, and regresses 3D boxes
simultaneously. Experimental results verify the effectiveness of the MTrans for
improving the quality of the generated labels. By enriching the sparse point
clouds, our method achieves 4.48\% and 4.03\% better 3D AP on KITTI moderate
and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans
can also be extended to improve the accuracy for 3D object detection, resulting
in a remarkable 89.45\% AP on KITTI hard samples. Codes are at
\url{https://github.com/Cliu2/MTrans}.
Related papers
- Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection [33.58208166717537]
3D object detection is essential for understanding 3D scenes.
Recent developments in semi-supervised methods seek to mitigate this problem by employing a teacher-student framework to generate pseudo-labels for unlabeled point clouds.
We introduce an Agent-based Diffusion Model for Semi-supervised 3D Object Detection (Diff3DETR)
arXiv Detail & Related papers (2024-08-01T05:04:22Z) - Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts [50.181870446016376]
This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts.
Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset.
arXiv Detail & Related papers (2024-07-16T04:53:28Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous
Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy.
Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes.
MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z) - MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention
Point Generator [33.354908372755325]
This work proposes a novel autolabeler, called multimodal attention point generator (MAP-Gen), that generates high-quality 3D labels from weak 2D boxes.
Using MAP-Gen, object detection networks that are weakly supervised by 2D boxes can achieve 9499% performance of those fully supervised by 3D annotations.
arXiv Detail & Related papers (2022-03-29T16:02:16Z) - Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection [0.0]
We propose a novel data fusion algorithm to combine accurate point clouds with dense but less accurate point clouds obtained from stereo pairs.
We train multiple 3D object detection methods and show that our fusion strategy consistently improves the performance of detectors.
arXiv Detail & Related papers (2021-11-08T19:29:59Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.