MonoGRNet: A General Framework for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2104.08797v1
- Date: Sun, 18 Apr 2021 10:07:52 GMT
- Title: MonoGRNet: A General Framework for Monocular 3D Object Detection
- Authors: Zengyi Qin, Jinglu Wang, Yan Lu
- Abstract summary: We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
- Score: 23.59839921644492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting and localizing objects in the real 3D space, which plays a crucial
role in scene understanding, is particularly challenging given only a monocular
image due to the geometric information loss during imagery projection. We
propose MonoGRNet for the amodal 3D object detection from a monocular image via
geometric reasoning in both the observed 2D projection and the unobserved depth
dimension. MonoGRNet decomposes the monocular 3D object detection task into
four sub-tasks including 2D object detection, instance-level depth estimation,
projected 3D center estimation and local corner regression. The task
decomposition significantly facilitates the monocular 3D object detection,
allowing the target 3D bounding boxes to be efficiently predicted in a single
forward pass, without using object proposals, post-processing or the
computationally expensive pixel-level depth estimation utilized by previous
methods. In addition, MonoGRNet flexibly adapts to both fully and weakly
supervised learning, which improves the feasibility of our framework in diverse
settings. Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
Results demonstrate the promising performance of our framework in various
scenarios.
Related papers
- VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Categorical Depth Distribution Network for Monocular 3D Object Detection [7.0405916639906785]
Key challenge in monocular 3D detection is accurately predicting object depth.
Many methods attempt to directly estimate depth to assist in 3D detection, but show limited performance as a result of depth inaccuracy.
We propose Categorical Depth Distribution Network (CaDDN) to project rich contextual feature information to the appropriate depth interval in 3D space.
We validate our approach on the KITTI 3D object detection benchmark, where we rank 1st among published monocular methods.
arXiv Detail & Related papers (2021-03-01T16:08:29Z) - Monocular Differentiable Rendering for Self-Supervised 3D Object
Detection [21.825158925459732]
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale.
We present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects.
Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective.
arXiv Detail & Related papers (2020-09-30T09:21:43Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - Monocular 3D Object Detection with Decoupled Structured Polygon
Estimation and Height-Guided Depth Estimation [41.29145717658494]
This paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task.
Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection.
Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.
arXiv Detail & Related papers (2020-02-05T03:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.