Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels
- URL: http://arxiv.org/abs/2010.03506v1
- Date: Wed, 7 Oct 2020 16:24:46 GMT
- Title: Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels
- Authors: L. Koestler and N. Yang and R. Wang and D. Cremers
- Abstract summary: Training of 3D object detectors requires datasets with 3D bounding box labels for supervision that have to be generated by hand-labeling.
We propose a network architecture and training procedure for learning monocular 3D object detection without 3D bounding box labels.
We evaluate the proposed algorithm on the real-world KITTI dataset and achieve promising performance in comparison to state-of-the-art methods requiring 3D bounding box labels for training.
- Score: 0.09558392439655011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training of deep-learning-based 3D object detectors requires large
datasets with 3D bounding box labels for supervision that have to be generated
by hand-labeling. We propose a network architecture and training procedure for
learning monocular 3D object detection without 3D bounding box labels. By
representing the objects as triangular meshes and employing differentiable
shape rendering, we define loss functions based on depth maps, segmentation
masks, and ego- and object-motion, which are generated by pre-trained,
off-the-shelf networks. We evaluate the proposed algorithm on the real-world
KITTI dataset and achieve promising performance in comparison to
state-of-the-art methods requiring 3D bounding box labels for training and
superior performance to conventional baseline methods.
Related papers
- Learning 3D Representations from Procedural 3D Programs [6.915871213703219]
Self-supervised learning has emerged as a promising approach for acquiring transferable 3D representations from unlabeled 3D point clouds.
We propose learning 3D representations from procedural 3D programs that automatically generate 3D shapes using simple primitives and augmentations.
arXiv Detail & Related papers (2024-11-25T18:59:57Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection [11.061100776969383]
Monocular 3D object detection poses a significant challenge in 3D scene understanding.
Existing methods heavily rely on supervised learning using abundant 3D labels.
We propose a novel weakly supervised 3D object detection framework named VSRD.
arXiv Detail & Related papers (2024-03-29T20:43:55Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - SL3D: Self-supervised-Self-labeled 3D Recognition [89.19932178712065]
We propose a Self-supervised-Self-Labeled 3D Recognition (SL3D) framework.
SL3D simultaneously solves two coupled objectives, i.e., clustering and learning feature representation.
It can be applied to solve different 3D recognition tasks, including classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2022-10-30T11:08:25Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - 3D Spatial Recognition without Spatially Labeled 3D [127.6254240158249]
We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition.
We show that WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time.
arXiv Detail & Related papers (2021-05-13T17:58:07Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z) - Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.