Object-Aware Centroid Voting for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2007.09836v1
- Date: Mon, 20 Jul 2020 02:11:18 GMT
- Title: Object-Aware Centroid Voting for Monocular 3D Object Detection
- Authors: Wentao Bao and Qi Yu and Yu Kong
- Abstract summary: We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
- Score: 30.59728753059457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection aims to detect objects in a 3D physical world
from a single camera. However, recent approaches either rely on expensive LiDAR
devices, or resort to dense pixel-wise depth estimation that causes prohibitive
computational cost. In this paper, we propose an end-to-end trainable monocular
3D object detector without learning the dense depth. Specifically, the grid
coordinates of a 2D box are first projected back to 3D space with the pinhole
model as 3D centroids proposals. Then, a novel object-aware voting approach is
introduced, which considers both the region-wise appearance attention and the
geometric projection distribution, to vote the 3D centroid proposals for 3D
object localization. With the late fusion and the predicted 3D orientation and
dimension, the 3D bounding boxes of objects can be detected from a single RGB
image. The method is straightforward yet significantly superior to other
monocular-based methods. Extensive experimental results on the challenging
KITTI benchmark validate the effectiveness of the proposed method.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Monocular 3D Object Detection with Bounding Box Denoising in 3D by
Perceiver [45.16079927526731]
Main challenge of monocular 3D object detection is the accurate localization of 3D center.
We propose a stage-wise approach, which combines the information flow from 2D-to-3D and 3D-to-2D.
Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors.
arXiv Detail & Related papers (2023-04-03T18:24:46Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - OCM3D: Object-Centric Monocular 3D Object Detection [35.804542148335706]
We propose a novel object-centric voxel representation tailored for monocular 3D object detection.
Specifically, voxels are built on each object proposal, and their sizes are adaptively determined by the 3D spatial distribution of the points.
Our method outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-04-13T09:15:40Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - Monocular 3D Object Detection with Decoupled Structured Polygon
Estimation and Height-Guided Depth Estimation [41.29145717658494]
This paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task.
Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection.
Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.
arXiv Detail & Related papers (2020-02-05T03:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.