Geometry Uncertainty Projection Network for Monocular 3D Object
Detection
- URL: http://arxiv.org/abs/2107.13774v1
- Date: Thu, 29 Jul 2021 06:59:07 GMT
- Title: Geometry Uncertainty Projection Network for Monocular 3D Object
Detection
- Authors: Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie
Yan and Wanli Ouyang
- Abstract summary: We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
- Score: 138.24798140338095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Geometry Projection is a powerful depth estimation method in monocular 3D
object detection. It estimates depth dependent on heights, which introduces
mathematical priors into the deep model. But projection process also introduces
the error amplification problem, in which the error of the estimated height
will be amplified and reflected greatly at the output depth. This property
leads to uncontrollable depth inferences and also damages the training
efficiency. In this paper, we propose a Geometry Uncertainty Projection Network
(GUP Net) to tackle the error amplification problem at both inference and
training stages. Specifically, a GUP module is proposed to obtains the
geometry-guided uncertainty of the inferred depth, which not only provides high
reliable confidence for each depth but also benefits depth learning.
Furthermore, at the training stage, we propose a Hierarchical Task Learning
strategy to reduce the instability caused by error amplification. This learning
algorithm monitors the learning situation of each task by a proposed indicator
and adaptively assigns the proper loss weights for different tasks according to
their pre-tasks situation. Based on that, each task starts learning only when
its pre-tasks are learned well, which can significantly improve the stability
and efficiency of the training process. Extensive experiments demonstrate the
effectiveness of the proposed method. The overall model can infer more reliable
object depth than existing methods and outperforms the state-of-the-art
image-based monocular 3D detectors by 3.74% and 4.7% AP40 of the car and
pedestrian categories on the KITTI benchmark.
Related papers
- Toward Accurate Camera-based 3D Object Detection via Cascade Depth
Estimation and Calibration [20.82054596017465]
Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces.
This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization.
arXiv Detail & Related papers (2024-02-07T14:21:26Z) - Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D
Object Detection [95.8940731298518]
We propose a novel Geometry Uncertainty Propagation Network (GUPNet++)
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.
arXiv Detail & Related papers (2023-10-24T08:45:15Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Variational Monocular Depth Estimation for Reliability Prediction [12.951621755732544]
Self-supervised learning for monocular depth estimation is widely investigated as an alternative to supervised learning approach.
Previous works have successfully improved the accuracy of depth estimation by modifying the model structure.
In this paper, we theoretically formulate a variational model for the monocular depth estimation to predict the reliability of the estimated depth image.
arXiv Detail & Related papers (2020-11-24T06:23:51Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.