Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking
- URL: http://arxiv.org/abs/2206.03666v1
- Date: Wed, 8 Jun 2022 03:37:59 GMT
- Title: Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking
- Authors: Longlong Jing, Ruichi Yu, Henrik Kretzschmar, Kang Li, Charles R. Qi,
Hang Zhao, Alper Ayvaci, Xu Chen, Dillon Cower, Yingwei Li, Yurong You, Han
Deng, Congcong Li, Dragomir Anguelov
- Abstract summary: Approaches to monocular 3D perception including detection and tracking often yield inferior performance when compared to LiDAR-based techniques.
We propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation.
- Score: 47.59619420444781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular image-based 3D perception has become an active research area in
recent years owing to its applications in autonomous driving. Approaches to
monocular 3D perception including detection and tracking, however, often yield
inferior performance when compared to LiDAR-based techniques. Through
systematic analysis, we identified that per-object depth estimation accuracy is
a major factor bounding the performance. Motivated by this observation, we
propose a multi-level fusion method that combines different representations
(RGB and pseudo-LiDAR) and temporal information across multiple frames for
objects (tracklets) to enhance per-object depth estimation. Our proposed fusion
method achieves the state-of-the-art performance of per-object depth estimation
on the Waymo Open Dataset, the KITTI detection dataset, and the KITTI MOT
dataset. We further demonstrate that by simply replacing estimated depth with
fusion-enhanced depth, we can achieve significant improvements in monocular 3D
perception tasks, including detection and tracking.
Related papers
- Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection [1.7710335706046505]
A multi-scale perception module based on dilated convolution is designed to enhance the model's processing ability for different scale targets.
By verifying the proposed algorithm on the KITTI test set and evaluation set, the experimental results show that compared with the baseline method, the proposed method improves by 5.27% in AP40 in the car category.
arXiv Detail & Related papers (2022-12-03T04:02:31Z) - Boosting Monocular 3D Object Detection with Object-Centric Auxiliary
Depth Supervision [13.593246617391266]
We propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task.
Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection.
Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects.
arXiv Detail & Related papers (2022-10-29T11:32:28Z) - MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object
Detection Algorithm [4.958840734249869]
This paper proposes a one-stage monocular 3D object detection algorithm based on multi-scale depth stratification.
Experiments on the KITTI benchmark show that the MDS-Net outperforms the existing monocular 3D detection methods in 3D detection and BEV detection tasks.
arXiv Detail & Related papers (2022-01-12T07:11:18Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.