Related papers: WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection

URL: http://arxiv.org/abs/2203.08332v1
Date: Wed, 16 Mar 2022 00:37:08 GMT
Title: WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection
Authors: Liang Peng, Senbo Yan, Boxi Wu, Zheng Yang, Xiaofei He, Deng Cai
Abstract summary: Existing monocular 3D detection methods rely on manually annotated 3D box labels on the LiDAR point clouds. In this paper, we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision. This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points.
Score: 29.616568669869206
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular 3D object detection is one of the most challenging tasks in 3D scene understanding. Due to the ill-posed nature of monocular imagery, existing monocular 3D detection methods highly rely on training with the manually annotated 3D box labels on the LiDAR point clouds. This annotation process is very laborious and expensive. To dispense with the reliance on 3D box labels, in this paper we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision. Eventually, we adopt a network to predict 3D boxes which can tightly align with associated RoI LiDAR points. This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points. We will illustrate the potential challenges of the above learning problem and resolve these challenges by introducing several effective designs into our method. Codes will be available at https://github.com/SPengLiang/WeakM3D.

Related papers

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det. OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes. It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z)
General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes. Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z)
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection [11.061100776969383]
Monocular 3D object detection poses a significant challenge in 3D scene understanding. Existing methods heavily rely on supervised learning using abundant 3D labels. We propose a novel weakly supervised 3D object detection framework named VSRD.
arXiv Detail & Related papers (2024-03-29T20:43:55Z)
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z)
OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection [24.9579490539696]
monocular 3D object detection has attracted much attention due to its simple configuration. In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity. We propose a plug-and-play module, underlineOne underlineBounding Box underlineMultiple underlineObjects (OBMO)
arXiv Detail & Related papers (2022-12-20T07:46:49Z)
FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations. Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation. It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z)
FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task. In this technical report, we study this problem with a practice built on fully convolutional single-stage detector. Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z)
Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes. Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios. We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z)
DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.