Center3D: Center-based Monocular 3D Object Detection with Joint Depth
Understanding
- URL: http://arxiv.org/abs/2005.13423v1
- Date: Wed, 27 May 2020 15:29:09 GMT
- Title: Center3D: Center-based Monocular 3D Object Detection with Joint Depth
Understanding
- Authors: Yunlei Tang, Sebastian Dorn and Chiragkumar Savani
- Abstract summary: Center3D is a one-stage anchor-free approach to efficiently estimate 3D location and depth.
By exploiting the difference between 2D and 3D centers, we are able to estimate depth consistently.
Compared with state-of-the-art detectors, Center3D has achieved the best speed-accuracy trade-off in realtime monocular object detection.
- Score: 3.4806267677524896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Localizing objects in 3D space and understanding their associated 3D
properties is challenging given only monocular RGB images. The situation is
compounded by the loss of depth information during perspective projection. We
present Center3D, a one-stage anchor-free approach, to efficiently estimate 3D
location and depth using only monocular RGB images. By exploiting the
difference between 2D and 3D centers, we are able to estimate depth
consistently. Center3D uses a combination of classification and regression to
understand the hidden depth information more robustly than each method alone.
Our method employs two joint approaches: (1) LID: a classification-dominated
approach with sequential Linear Increasing Discretization. (2) DepJoint: a
regression-dominated approach with multiple Eigen's transformations for depth
estimation. Evaluating on KITTI dataset for moderate objects, Center3D improved
the AP in BEV from $29.7\%$ to $42.8\%$, and the AP in 3D from $18.6\%$ to
$39.1\%$. Compared with state-of-the-art detectors, Center3D has achieved the
best speed-accuracy trade-off in realtime monocular object detection.
Related papers
- Toward Accurate Camera-based 3D Object Detection via Cascade Depth
Estimation and Calibration [20.82054596017465]
Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces.
This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization.
arXiv Detail & Related papers (2024-02-07T14:21:26Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Stereo Object Matching Network [78.35697025102334]
This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
arXiv Detail & Related papers (2021-03-23T12:54:43Z) - Stereo CenterNet based 3D Object Detection for Autonomous Driving [2.508414661327797]
We propose a 3D object detection method using geometric information in stereo images, called Stereo CenterNet.
Stereo CenterNet predicts the four semantic key points of the 3D bounding box of the object in space and uses 2D left right boxes, 3D dimension, orientation and key points to restore the bounding box of the object in the 3D space.
Experiments conducted on the KITTI dataset show that our method achieves the best speed-accuracy trade-off compared with the state-of-the-art methods based on stereo geometry.
arXiv Detail & Related papers (2021-03-20T02:18:49Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.