Learning Occupancy for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2305.15694v1
- Date: Thu, 25 May 2023 04:03:46 GMT
- Title: Learning Occupancy for Monocular 3D Object Detection
- Authors: Liang Peng, Junkai Xu, Haoran Cheng, Zheng Yang, Xiaopei Wu, Wei Qian,
Wenxiao Wang, Boxi Wu, Deng Cai
- Abstract summary: We propose textbfOccupancyM3D, a method of learning occupancy for monocular 3D detection.
It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations.
Experiments on KITTI and open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.
- Score: 25.56336546513198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D detection is a challenging task due to the lack of accurate 3D
information. Existing approaches typically rely on geometry constraints and
dense depth estimates to facilitate the learning, but often fail to fully
exploit the benefits of three-dimensional feature extraction in frustum and 3D
space. In this paper, we propose \textbf{OccupancyM3D}, a method of learning
occupancy for monocular 3D detection. It directly learns occupancy in frustum
and 3D space, leading to more discriminative and informative 3D features and
representations. Specifically, by using synchronized raw sparse LiDAR point
clouds, we define the space status and generate voxel-based occupancy labels.
We formulate occupancy prediction as a simple classification problem and design
associated occupancy losses. Resulting occupancy estimates are employed to
enhance original frustum/3D features. As a result, experiments on KITTI and
Waymo open datasets demonstrate that the proposed method achieves a new state
of the art and surpasses other methods by a significant margin. Codes and
pre-trained models will be available at
\url{https://github.com/SPengLiang/OccupancyM3D}.
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - D3Feat: Joint Learning of Dense Detection and Description of 3D Local
Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds.
We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point.
Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.