Towards 3D Object Detection with 2D Supervision
- URL: http://arxiv.org/abs/2211.08287v1
- Date: Tue, 15 Nov 2022 16:40:11 GMT
- Title: Towards 3D Object Detection with 2D Supervision
- Authors: Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu
Zhang
- Abstract summary: We introduce a hybrid training framework, enabling us to learn a visual 3D object detector with massive 2D labels.
We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.
Experiments conducted on the nuScenes dataset show strong results (nearly 90% of its fully-supervised performance) with only 25% 3D annotations.
- Score: 13.444432119639822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The great progress of 3D object detectors relies on large-scale data and 3D
annotations. The annotation cost for 3D bounding boxes is extremely expensive
while the 2D ones are easier and cheaper to collect. In this paper, we
introduce a hybrid training framework, enabling us to learn a visual 3D object
detector with massive 2D (pseudo) labels, even without 3D annotations. To break
through the information bottleneck of 2D clues, we explore a new perspective:
Temporal 2D Supervision. We propose a temporal 2D transformation to bridge the
3D predictions with temporal 2D labels. Two steps, including homography wraping
and 2D box deduction, are taken to transform the 3D predictions into 2D ones
for supervision. Experiments conducted on the nuScenes dataset show strong
results (nearly 90% of its fully-supervised performance) with only 25% 3D
annotations. We hope our findings can provide new insights for using a large
number of 2D annotations for 3D perception.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Roadside Monocular 3D Detection via 2D Detection Prompting [11.511202614683388]
We present a novel and simple method by prompting the 3D detector using 2D detections.
Our method builds on a key insight that, compared with 3D detectors, a 2D detector is much easier to train and performs significantly better w.r.t detections on the 2D image plane.
arXiv Detail & Related papers (2024-04-01T11:57:34Z) - Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Learning from 2D: Pixel-to-Point Knowledge Transfer for 3D Pretraining [21.878815180924832]
We present a novel 3D pretraining method by leveraging 2D networks learned from rich 2D datasets.
Our experiments show that the 3D models pretrained with 2D knowledge boost the performances across various real-world 3D downstream tasks.
arXiv Detail & Related papers (2021-04-10T05:40:42Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.