Delving into Localization Errors for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2103.16237v1
- Date: Tue, 30 Mar 2021 10:38:01 GMT
- Title: Delving into Localization Errors for Monocular 3D Object Detection
- Authors: Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li,
Wanli Ouyang
- Abstract summary: Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
- Score: 85.77319416168362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating 3D bounding boxes from monocular images is an essential component
in autonomous driving, while accurate 3D object detection from this kind of
data is very challenging. In this work, by intensive diagnosis experiments, we
quantify the impact introduced by each sub-task and found the `localization
error' is the vital factor in restricting monocular 3D detection. Besides, we
also investigate the underlying reasons behind localization errors, analyze the
issues they might bring, and propose three strategies. First, we revisit the
misalignment between the center of the 2D bounding box and the projected center
of the 3D object, which is a vital factor leading to low localization accuracy.
Second, we observe that accurately localizing distant objects with existing
technologies is almost impossible, while those samples will mislead the learned
network. To this end, we propose to remove such samples from the training set
for improving the overall performance of the detector. Lastly, we also propose
a novel 3D IoU oriented loss for the size estimation of the object, which is
not affected by `localization error'. We conduct extensive experiments on the
KITTI dataset, where the proposed method achieves real-time detection and
outperforms previous methods by a large margin. The code will be made available
at: https://github.com/xinzhuma/monodle.
Related papers
- Uncertainty Estimation for 3D Object Detection via Evidential Learning [63.61283174146648]
We introduce a framework for quantifying uncertainty in 3D object detection by leveraging an evidential learning loss on Bird's Eye View representations in the 3D detector.
We demonstrate both the efficacy and importance of these uncertainty estimates on identifying out-of-distribution scenes, poorly localized objects, and missing (false negative) detections.
arXiv Detail & Related papers (2024-10-31T13:13:32Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Categorical Depth Distribution Network for Monocular 3D Object Detection [7.0405916639906785]
Key challenge in monocular 3D detection is accurately predicting object depth.
Many methods attempt to directly estimate depth to assist in 3D detection, but show limited performance as a result of depth inaccuracy.
We propose Categorical Depth Distribution Network (CaDDN) to project rich contextual feature information to the appropriate depth interval in 3D space.
We validate our approach on the KITTI 3D object detection benchmark, where we rank 1st among published monocular methods.
arXiv Detail & Related papers (2021-03-01T16:08:29Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.