Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object
Detection without 3D Annotations
- URL: http://arxiv.org/abs/2211.07108v3
- Date: Tue, 12 Sep 2023 00:55:32 GMT
- Title: Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object
Detection without 3D Annotations
- Authors: Shun Gui and Yan Luximon
- Abstract summary: We propose a method that does not demand any 3D annotations, while being able to predict fully oriented 3D bounding boxes.
Our method, called Recursive Cross-View (RCV), utilizes the three-view principle to convert 3D detection into multiple 2D detection tasks.
RCV is the first 3D detection method that yields fully oriented 3D boxes without consuming 3D labels.
- Score: 0.5439020425819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heavily relying on 3D annotations limits the real-world application of 3D
object detection. In this paper, we propose a method that does not demand any
3D annotation, while being able to predict fully oriented 3D bounding boxes.
Our method, called Recursive Cross-View (RCV), utilizes the three-view
principle to convert 3D detection into multiple 2D detection tasks, requiring
only a subset of 2D labels. We propose a recursive paradigm, in which instance
segmentation and 3D bounding box generation by Cross-View are implemented
recursively until convergence. Specifically, our proposed method involves the
use of a frustum for each 2D bounding box, which is then followed by the
recursive paradigm that ultimately generates a fully oriented 3D box, along
with its corresponding class and score. Note that, class and score are given by
the 2D detector. Estimated on the SUN RGB-D and KITTI datasets, our method
outperforms existing image-based approaches. To justify that our method can be
quickly used to new tasks, we implement it on two real-world scenarios, namely
3D human detection and 3D hand detection. As a result, two new 3D annotated
datasets are obtained, which means that RCV can be viewed as a (semi-)
automatic 3D annotator. Furthermore, we deploy RCV on a depth sensor, which
achieves detection at 7 fps on a live RGB-D stream. RCV is the first 3D
detection method that yields fully oriented 3D boxes without consuming 3D
labels.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z) - RTM3D: Real-time Monocular 3D Detection from Object Keypoints for
Autonomous Driving [26.216609821525676]
Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component.
Our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space.
Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2020-01-10T08:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.