Related papers: Weakly Supervised 3D Object Detection from Point Clouds

Weakly Supervised 3D Object Detection from Point Clouds

URL: http://arxiv.org/abs/2007.13970v1
Date: Tue, 28 Jul 2020 03:30:11 GMT
Title: Weakly Supervised 3D Object Detection from Point Clouds
Authors: Zengyi Qin, Jinglu Wang, Yan Lu
Abstract summary: 3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes. Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios. We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
Score: 27.70180601788613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes. Existing 3D object detectors heavily rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios. Weakly supervised learning is a promising approach to reducing the annotation requirement, but existing weakly supervised object detectors are mostly for 2D detection rather than 3D. In this work, we propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training. First, we introduce an unsupervised 3D proposal module that generates object proposals by leveraging normalized point cloud densities. Second, we present a cross-modal knowledge distillation strategy, where a convolutional neural network learns to predict the final results from the 3D object proposals by querying a teacher network pretrained on image datasets. Comprehensive experiments on the challenging KITTI dataset demonstrate the superior performance of our VS3D in diverse evaluation settings. The source code and pretrained models are publicly available at https://github.com/Zengyi-Qin/Weakly-Supervised-3D-Object-Detection.

Related papers

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection [58.78881632019072]
We introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD)<n>We lift the open-set 2D detection into 3D space through our designed 3D bounding box head.<n>We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes.
arXiv Detail & Related papers (2025-07-31T13:56:41Z)
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D [68.23391872643268]
LOCATE 3D is a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp" It operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices.
arXiv Detail & Related papers (2025-04-19T02:51:24Z)
Weak Cube R-CNN: Weakly Supervised 3D Detection using only 2D Bounding Boxes [5.492174268132387]
3D object detectors are typically trained in a fully supervised way, relying extensively on 3D labeled data. This work focuses on weakly-supervised 3D detection to reduce data needs using a monocular method. We propose a general model Weak Cube R-CNN, which can predict objects in 3D at inference time.
arXiv Detail & Related papers (2025-04-17T19:13:42Z)
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det. OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes. It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z)
STONE: A Submodular Optimization Framework for Active 3D Object Detection [20.54906045954377]
Key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data. This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors.
arXiv Detail & Related papers (2024-10-04T20:45:33Z)
General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes. Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z)
Attention-Based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding. Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth. We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z)
CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework. Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene. In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations. Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation. It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z)
CoCoNets: Continuous Contrastive 3D Scene Representations [21.906643302668716]
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos. We show the resulting 3D visual feature representations effectively scale across objects and scenes, imagine information occluded or missing from the input viewpoints, track objects over time, align semantically related objects in 3D, and improve 3D object detection.
arXiv Detail & Related papers (2021-04-08T15:50:47Z)
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds. We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z)
SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.