MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention
Point Generator
- URL: http://arxiv.org/abs/2203.15700v1
- Date: Tue, 29 Mar 2022 16:02:16 GMT
- Title: MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention
Point Generator
- Authors: Chang Liu, Xiaoyan Qian, Xiaojuan Qi, Edmund Y. Lam, Siew-Chong Tan,
Ngai Wong
- Abstract summary: This work proposes a novel autolabeler, called multimodal attention point generator (MAP-Gen), that generates high-quality 3D labels from weak 2D boxes.
Using MAP-Gen, object detection networks that are weakly supervised by 2D boxes can achieve 9499% performance of those fully supervised by 3D annotations.
- Score: 33.354908372755325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Manually annotating 3D point clouds is laborious and costly, limiting the
training data preparation for deep learning in real-world object detection.
While a few previous studies tried to automatically generate 3D bounding boxes
from weak labels such as 2D boxes, the quality is sub-optimal compared to human
annotators. This work proposes a novel autolabeler, called multimodal attention
point generator (MAP-Gen), that generates high-quality 3D labels from weak 2D
boxes. It leverages dense image information to tackle the sparsity issue of 3D
point clouds, thus improving label quality. For each 2D pixel, MAP-Gen predicts
its corresponding 3D coordinates by referencing context points based on their
2D semantic or geometric relationships. The generated 3D points densify the
original sparse point clouds, followed by an encoder to regress 3D bounding
boxes. Using MAP-Gen, object detection networks that are weakly supervised by
2D boxes can achieve 94~99% performance of those fully supervised by 3D
annotations. It is hopeful this newly proposed MAP-Gen autolabeling flow can
shed new light on utilizing multimodal information for enriching sparse point
clouds.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Points-to-3D: Bridging the Gap between Sparse Points and
Shape-Controllable Text-to-3D Generation [16.232803881159022]
We propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation.
The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation.
arXiv Detail & Related papers (2023-07-26T02:16:55Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - Multimodal Transformer for Automatic 3D Annotation and Object Detection [27.92241487946078]
We propose an end-to-end multimodal transformer (MTrans) autolabeler to generate precise 3D box annotations from weak 2D bounding boxes.
With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously.
By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively.
arXiv Detail & Related papers (2022-07-20T10:38:29Z) - Deep Hybrid Self-Prior for Full 3D Mesh Generation [57.78562932397173]
We propose to exploit a novel hybrid 2D-3D self-prior in deep neural networks to significantly improve the geometry quality.
In particular, we first generate an initial mesh using a 3D convolutional neural network with 3D self-prior, and then encode both 3D information and color information in the 2D UV atlas.
Our method recovers the 3D textured mesh model of high quality from sparse input, and outperforms the state-of-the-art methods in terms of both the geometry and texture quality.
arXiv Detail & Related papers (2021-08-18T07:44:21Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.