Geometry-based Distance Decomposition for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2104.03775v1
- Date: Thu, 8 Apr 2021 13:57:30 GMT
- Title: Geometry-based Distance Decomposition for Monocular 3D Object Detection
- Authors: Xuepeng Shi, Qi Ye, Xiaozhi Chen, Chuangrong Chen, Zhixiang Chen,
Tae-Kyun Kim
- Abstract summary: We propose a novel geometry-based distance decomposition to recover the distance by its factors.
The decomposition factors the distance of objects into the most representative and stable variables.
Our method directly predicts 3D bounding boxes from RGB images with a compact architecture.
- Score: 48.63934632884799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection is of great significance for autonomous driving
but remains challenging. The core challenge is to predict the distance of
objects in the absence of explicit depth information. Unlike regressing the
distance as a single variable in most existing methods, we propose a novel
geometry-based distance decomposition to recover the distance by its factors.
The decomposition factors the distance of objects into the most representative
and stable variables, i.e. the physical height and the projected visual height
in the image plane. Moreover, the decomposition maintains the self-consistency
between the two heights, leading to the robust distance prediction when both
predicted heights are inaccurate. The decomposition also enables us to trace
the cause of the distance uncertainty for different scenarios. Such
decomposition makes the distance prediction interpretable, accurate, and
robust. Our method directly predicts 3D bounding boxes from RGB images with a
compact architecture, making the training and inference simple and efficient.
The experimental results show that our method achieves the state-of-the-art
performance on the monocular 3D Object detection and Birds Eye View tasks on
the KITTI dataset, and can generalize to images with different camera
intrinsics.
Related papers
- FocalPose++: Focal Length and Object Pose Estimation via Render and Compare [35.388094104164175]
We introduce FocalPose++, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length.
We show results on three challenging benchmark datasets that depict known 3D models in uncontrolled settings.
arXiv Detail & Related papers (2023-11-15T13:28:02Z) - Explicit3D: Graph Network with Spatial Inference for Single Image 3D
Object Detection [35.85544715234846]
We propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features.
Our experimental results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves better performance balance than the-state-of-the-art.
arXiv Detail & Related papers (2023-02-13T16:19:54Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty
Propagation [4.202461384355329]
We propose MonoRUn, a novel 3D object detection framework that learns dense correspondences and geometry in a self-supervised manner.
Our proposed approach outperforms current state-of-the-art methods on KITTI benchmark.
arXiv Detail & Related papers (2021-03-23T15:03:08Z) - Anchor Distance for 3D Multi-Object Distance Estimation from 2D Single
Shot [15.815583594196488]
We present a real time approach for estimating the distances to multiple objects in a scene using only a single-shot image.
We let the predictors catch the distance prior using anchor distance and train the network based on the distance.
The proposed method achieves about 30 FPS speed, and shows the lowest RMSE compared to the existing methods.
arXiv Detail & Related papers (2021-01-25T20:33:05Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.