Depth-discriminative Metric Learning for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2401.01075v1
- Date: Tue, 2 Jan 2024 07:34:09 GMT
- Title: Depth-discriminative Metric Learning for Monocular 3D Object Detection
- Authors: Wonhyeok Choi, Mingyu Shin, Sunghoon Im
- Abstract summary: We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
- Score: 14.554132525651868
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Monocular 3D object detection poses a significant challenge due to the lack
of depth information in RGB images. Many existing methods strive to enhance the
object depth estimation performance by allocating additional parameters for
object depth estimation, utilizing extra modules or data. In contrast, we
introduce a novel metric learning scheme that encourages the model to extract
depth-discriminative features regardless of the visual attributes without
increasing inference time and model size. Our method employs the
distance-preserving function to organize the feature space manifold in relation
to ground-truth object depth. The proposed (K, B, eps)-quasi-isometric loss
leverages predetermined pairwise distance restriction as guidance for adjusting
the distance among object descriptors without disrupting the non-linearity of
the natural feature manifold. Moreover, we introduce an auxiliary head for
object-wise depth estimation, which enhances depth quality while maintaining
the inference time. The broad applicability of our method is demonstrated
through experiments that show improvements in overall performance when
integrated into various baselines. The results show that our method
consistently improves the performance of various baselines by 23.51% and 5.78%
on average across KITTI and Waymo, respectively.
Related papers
- IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection [1.7710335706046505]
A multi-scale perception module based on dilated convolution is designed to enhance the model's processing ability for different scale targets.
By verifying the proposed algorithm on the KITTI test set and evaluation set, the experimental results show that compared with the baseline method, the proposed method improves by 5.27% in AP40 in the car category.
arXiv Detail & Related papers (2022-12-03T04:02:31Z) - Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking [47.59619420444781]
Approaches to monocular 3D perception including detection and tracking often yield inferior performance when compared to LiDAR-based techniques.
We propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation.
arXiv Detail & Related papers (2022-06-08T03:37:59Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - Progressive Multi-scale Fusion Network for RGB-D Salient Object
Detection [9.099589602551575]
We discuss about the advantages of the so-called progressive multi-scale fusion method and propose a mask-guided feature aggregation module.
The proposed framework can effectively combine the two features of different modalities and alleviate the impact of erroneous depth features.
We further introduce a mask-guided refinement module(MGRM) to complement the high-level semantic features and reduce the irrelevant features from multi-scale fusion.
arXiv Detail & Related papers (2021-06-07T20:02:39Z) - Objects are Different: Flexible Monocular 3D Object Detection [87.82253067302561]
We propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation.
Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27% for the moderate level and 30% for the hard level in the test set of KITTI benchmark.
arXiv Detail & Related papers (2021-04-06T07:01:28Z) - Monocular 3D Object Detection with Sequential Feature Association and
Depth Hint Augmentation [12.55603878441083]
FADNet is presented to address the task of monocular 3D object detection.
A dedicated depth hint module is designed to generate row-wise features named as depth hints.
The contributions of this work are validated by conducting experiments and ablation study on the KITTI benchmark.
arXiv Detail & Related papers (2020-11-30T07:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.