Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D
Object Detection
- URL: http://arxiv.org/abs/2211.01556v1
- Date: Thu, 3 Nov 2022 02:21:35 GMT
- Title: Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D
Object Detection
- Authors: Fan Yang, Xinhao Xu, Hui Chen, Yuchen Guo, Jungong Han, Kai Ni,
Guiguang Ding
- Abstract summary: The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD)
We propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go.
Our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach.
- Score: 92.75961303269548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ground plane prior is a very informative geometry clue in monocular 3D
object detection (M3OD). However, it has been neglected by most mainstream
methods. In this paper, we identify two key factors that limit the
applicability of ground plane prior: the projection point localization issue
and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we
propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at
one go. For the projection point localization issue, instead of using the
bottom vertices or bottom center of the 3D bounding box (BBox), we leverage the
object's ground contact points, which are explicit pixels in the image and easy
for the neural network to detect. For the ground plane tilt problem, our GPENet
estimates the horizon line in the image and derives a novel mathematical
expression to accurately estimate the ground plane equation. An unsupervised
vertical edge mining algorithm is also proposed to address the occlusion of the
horizon line. Furthermore, we design a novel 3D bounding box deduction method
based on a dynamic back projection algorithm, which could take advantage of the
accurate contact points and the ground plane equation. Additionally, using only
M3OD labels, contact point and horizon line pseudo labels can be easily
generated with NO extra data collection and label annotation cost. Extensive
experiments on the popular KITTI benchmark show that our GPENet can outperform
other methods and achieve state-of-the-art performance, well demonstrating the
effectiveness and the superiority of the proposed approach. Moreover, our
GPENet works better than other methods in cross-dataset evaluation on the
nuScenes dataset. Our code and models will be published.
Related papers
- MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection.
Previous method relies on NeRF for geometry reasoning.
We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z) - AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings [26.845588648999417]
We tackle the problem of estimating the planar surfaces in a 3D scene from posed images.
We propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes.
We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches.
arXiv Detail & Related papers (2024-06-13T09:49:31Z) - MonoGround: Detecting Monocular 3D Objects from the Ground [14.225093154566439]
We propose to introduce the ground plane as a prior in the monocular 3d object detection.
The ground plane prior serves as an additional geometric condition to the ill-posed mapping and an extra source in depth estimation.
Our method could achieve state-of-the-art results compared with other methods while maintaining a very fast speed.
arXiv Detail & Related papers (2022-06-15T08:27:46Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - Monocular Road Planar Parallax Estimation [25.36368935789501]
Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving.
We propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences.
RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $gamma$ map for 3D reconstruction.
arXiv Detail & Related papers (2021-11-22T10:03:41Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty
Propagation [4.202461384355329]
We propose MonoRUn, a novel 3D object detection framework that learns dense correspondences and geometry in a self-supervised manner.
Our proposed approach outperforms current state-of-the-art methods on KITTI benchmark.
arXiv Detail & Related papers (2021-03-23T15:03:08Z) - Depth Completion using Piecewise Planar Model [94.0808155168311]
A depth map can be represented by a set of learned bases and can be efficiently solved in a closed form solution.
However, one issue with this method is that it may create artifacts when colour boundaries are inconsistent with depth boundaries.
We enforce a more strict model in depth recovery: a piece-wise planar model.
arXiv Detail & Related papers (2020-12-06T07:11:46Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.