MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty
Propagation
- URL: http://arxiv.org/abs/2103.12605v2
- Date: Wed, 24 Mar 2021 12:28:15 GMT
- Title: MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty
Propagation
- Authors: Hansheng Chen, Yuyao Huang, Wei Tian, Zhong Gao, Lu Xiong
- Abstract summary: We propose MonoRUn, a novel 3D object detection framework that learns dense correspondences and geometry in a self-supervised manner.
Our proposed approach outperforms current state-of-the-art methods on KITTI benchmark.
- Score: 4.202461384355329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object localization in 3D space is a challenging aspect in monocular 3D
object detection. Recent advances in 6DoF pose estimation have shown that
predicting dense 2D-3D correspondence maps between image and object 3D model
and then estimating object pose via Perspective-n-Point (PnP) algorithm can
achieve remarkable localization accuracy. Yet these methods rely on training
with ground truth of object geometry, which is difficult to acquire in real
outdoor scenes. To address this issue, we propose MonoRUn, a novel detection
framework that learns dense correspondences and geometry in a self-supervised
manner, with simple 3D bounding box annotations. To regress the pixel-related
3D object coordinates, we employ a regional reconstruction network with
uncertainty awareness. For self-supervised training, the predicted 3D
coordinates are projected back to the image plane. A Robust KL loss is proposed
to minimize the uncertainty-weighted reprojection error. During testing phase,
we exploit the network uncertainty by propagating it through all downstream
modules. More specifically, the uncertainty-driven PnP algorithm is leveraged
to estimate object pose and its covariance. Extensive experiments demonstrate
that our proposed approach outperforms current state-of-the-art methods on
KITTI benchmark.
Related papers
- GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D
Object Detection [95.8940731298518]
We propose a novel Geometry Uncertainty Propagation Network (GUPNet++)
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.
arXiv Detail & Related papers (2023-10-24T08:45:15Z) - Multi-view 3D Object Reconstruction and Uncertainty Modelling with
Neural Shape Prior [9.716201630968433]
3D object reconstruction is important for semantic scene understanding.
It is challenging to reconstruct detailed 3D shapes from monocular images directly due to a lack of depth information, occlusion and noise.
We tackle this problem by leveraging a neural object representation which learns an object shape distribution from large dataset of 3d object models and maps it into a latent space.
We propose a method to model uncertainty as part of the representation and define an uncertainty-aware encoder which generates latent codes with uncertainty directly from individual input images.
arXiv Detail & Related papers (2023-06-17T03:25:13Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - Monocular Differentiable Rendering for Self-Supervised 3D Object
Detection [21.825158925459732]
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale.
We present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects.
Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective.
arXiv Detail & Related papers (2020-09-30T09:21:43Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - Monocular 3D Object Detection with Decoupled Structured Polygon
Estimation and Height-Guided Depth Estimation [41.29145717658494]
This paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task.
Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection.
Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.
arXiv Detail & Related papers (2020-02-05T03:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.