Weakly Supervised 3D Object Detection from Point Clouds
- URL: http://arxiv.org/abs/2007.13970v1
- Date: Tue, 28 Jul 2020 03:30:11 GMT
- Title: Weakly Supervised 3D Object Detection from Point Clouds
- Authors: Zengyi Qin, Jinglu Wang, Yan Lu
- Abstract summary: 3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
- Score: 27.70180601788613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A crucial task in scene understanding is 3D object detection, which aims to
detect and localize the 3D bounding boxes of objects belonging to specific
classes. Existing 3D object detectors heavily rely on annotated 3D bounding
boxes during training, while these annotations could be expensive to obtain and
only accessible in limited scenarios. Weakly supervised learning is a promising
approach to reducing the annotation requirement, but existing weakly supervised
object detectors are mostly for 2D detection rather than 3D. In this work, we
propose VS3D, a framework for weakly supervised 3D object detection from point
clouds without using any ground truth 3D bounding box for training. First, we
introduce an unsupervised 3D proposal module that generates object proposals by
leveraging normalized point cloud densities. Second, we present a cross-modal
knowledge distillation strategy, where a convolutional neural network learns to
predict the final results from the 3D object proposals by querying a teacher
network pretrained on image datasets. Comprehensive experiments on the
challenging KITTI dataset demonstrate the superior performance of our VS3D in
diverse evaluation settings. The source code and pretrained models are publicly
available at
https://github.com/Zengyi-Qin/Weakly-Supervised-3D-Object-Detection.
Related papers
- Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - STONE: A Submodular Optimization Framework for Active 3D Object Detection [20.54906045954377]
Key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data.
This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors.
arXiv Detail & Related papers (2024-10-04T20:45:33Z) - General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - CoCoNets: Continuous Contrastive 3D Scene Representations [21.906643302668716]
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos.
We show the resulting 3D visual feature representations effectively scale across objects and scenes, imagine information occluded or missing from the input viewpoints, track objects over time, align semantically related objects in 3D, and improve 3D object detection.
arXiv Detail & Related papers (2021-04-08T15:50:47Z) - D3Feat: Joint Learning of Dense Detection and Description of 3D Local
Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds.
We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point.
Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.