FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection
- URL: http://arxiv.org/abs/2105.07647v1
- Date: Mon, 17 May 2021 07:29:55 GMT
- Title: FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection
- Authors: Yi Wei, Shang Su, Jiwen Lu, Jie Zhou
- Abstract summary: We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
- Score: 81.79171905308827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the problem of weakly supervised 3D vehicle
detection. Conventional methods for 3D object detection need vast amounts of
manually labelled 3D data as supervision signals. However, annotating large
datasets requires huge human efforts, especially for 3D area. To tackle this
problem, we propose frustum-aware geometric reasoning (FGR) to detect vehicles
in point clouds without any 3D annotations. Our method consists of two stages:
coarse 3D segmentation and 3D bounding box estimation. For the first stage, a
context-aware adaptive region growing algorithm is designed to segment objects
based on 2D bounding boxes. Leveraging predicted segmentation masks, we develop
an anti-noise approach to estimate 3D bounding boxes in the second stage.
Finally 3D pseudo labels generated by our method are utilized to train a 3D
detector. Independent of any 3D groundtruth, FGR reaches comparable performance
with fully supervised methods on the KITTI dataset. The findings indicate that
it is able to accurately detect objects in 3D space with only 2D bounding boxes
and sparse point clouds.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection [11.061100776969383]
Monocular 3D object detection poses a significant challenge in 3D scene understanding.
Existing methods heavily rely on supervised learning using abundant 3D labels.
We propose a novel weakly supervised 3D object detection framework named VSRD.
arXiv Detail & Related papers (2024-03-29T20:43:55Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object
Detection without 3D Annotations [0.5439020425819]
We propose a method that does not demand any 3D annotations, while being able to predict fully oriented 3D bounding boxes.
Our method, called Recursive Cross-View (RCV), utilizes the three-view principle to convert 3D detection into multiple 2D detection tasks.
RCV is the first 3D detection method that yields fully oriented 3D boxes without consuming 3D labels.
arXiv Detail & Related papers (2022-11-14T04:51:05Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.