WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2203.08332v1
- Date: Wed, 16 Mar 2022 00:37:08 GMT
- Title: WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection
- Authors: Liang Peng, Senbo Yan, Boxi Wu, Zheng Yang, Xiaofei He, Deng Cai
- Abstract summary: Existing monocular 3D detection methods rely on manually annotated 3D box labels on the LiDAR point clouds.
In this paper, we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision.
This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points.
- Score: 29.616568669869206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection is one of the most challenging tasks in 3D
scene understanding. Due to the ill-posed nature of monocular imagery, existing
monocular 3D detection methods highly rely on training with the manually
annotated 3D box labels on the LiDAR point clouds. This annotation process is
very laborious and expensive. To dispense with the reliance on 3D box labels,
in this paper we explore the weakly supervised monocular 3D detection.
Specifically, we first detect 2D boxes on the image. Then, we adopt the
generated 2D boxes to select corresponding RoI LiDAR points as the weak
supervision. Eventually, we adopt a network to predict 3D boxes which can
tightly align with associated RoI LiDAR points. This network is learned by
minimizing our newly-proposed 3D alignment loss between the 3D box estimates
and the corresponding RoI LiDAR points. We will illustrate the potential
challenges of the above learning problem and resolve these challenges by
introducing several effective designs into our method. Codes will be available
at https://github.com/SPengLiang/WeakM3D.
Related papers
- Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det.
OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes.
It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z) - General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection [11.061100776969383]
Monocular 3D object detection poses a significant challenge in 3D scene understanding.
Existing methods heavily rely on supervised learning using abundant 3D labels.
We propose a novel weakly supervised 3D object detection framework named VSRD.
arXiv Detail & Related papers (2024-03-29T20:43:55Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - OBMO: One Bounding Box Multiple Objects for Monocular 3D Object
Detection [24.9579490539696]
monocular 3D object detection has attracted much attention due to its simple configuration.
In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity.
We propose a plug-and-play module, underlineOne underlineBounding Box underlineMultiple underlineObjects (OBMO)
arXiv Detail & Related papers (2022-12-20T07:46:49Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.