Related papers: Multimodal Transformer for Automatic 3D Annotation and Object Detection

Multimodal Transformer for Automatic 3D Annotation and Object Detection

URL: http://arxiv.org/abs/2207.09805v1
Date: Wed, 20 Jul 2022 10:38:29 GMT
Title: Multimodal Transformer for Automatic 3D Annotation and Object Detection
Authors: Chang Liu, Xiaoyan Qian, Binxiao Huang, Xiaojuan Qi, Edmund Lam, Siew-Chong Tan, Ngai Wong
Abstract summary: We propose an end-to-end multimodal transformer (MTrans) autolabeler to generate precise 3D box annotations from weak 2D bounding boxes. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively.
Score: 27.92241487946078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48\% and 4.03\% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45\% AP on KITTI hard samples. Codes are at \url{https://github.com/Cliu2/MTrans}.

Related papers

SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts [13.349110509879312]
sparsely-supervised 3D object detection has gained great attention, achieving performance close to fully-supervised 3D objectors. We propose a boosting strategy, termed SP3D, to boost the 3D detector with robust feature discrimination capability under sparse annotation settings. Experiments have validated that SP3D can enhance the performance of sparsely supervised detectors by a large margin under meager labeling conditions.
arXiv Detail & Related papers (2025-03-09T06:08:04Z)
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det. OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes. It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z)
Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection [33.58208166717537]
3D object detection is essential for understanding 3D scenes. Recent developments in semi-supervised methods seek to mitigate this problem by employing a teacher-student framework to generate pseudo-labels for unlabeled point clouds. We introduce an Agent-based Diffusion Model for Semi-supervised 3D Object Detection (Diff3DETR)
arXiv Detail & Related papers (2024-08-01T05:04:22Z)
Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts [50.181870446016376]
This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset.
arXiv Detail & Related papers (2024-07-16T04:53:28Z)
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z)
DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations. We leverage the strong semantic prior within a 3D generative model to train a semantic decoder. Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z)
MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy. Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes. MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z)
MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention Point Generator [33.354908372755325]
This work proposes a novel autolabeler, called multimodal attention point generator (MAP-Gen), that generates high-quality 3D labels from weak 2D boxes. Using MAP-Gen, object detection networks that are weakly supervised by 2D boxes can achieve 9499% performance of those fully supervised by 3D annotations.
arXiv Detail & Related papers (2022-03-29T16:02:16Z)
Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection [0.0]
We propose a novel data fusion algorithm to combine accurate point clouds with dense but less accurate point clouds obtained from stereo pairs. We train multiple 3D object detection methods and show that our fusion strategy consistently improves the performance of detectors.
arXiv Detail & Related papers (2021-11-08T19:29:59Z)
FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations. Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation. It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z)
ST3D: Self-training for Unsupervised Domain Adaptation on 3D ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds. Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.