Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
- URL: http://arxiv.org/abs/2503.19947v1
- Date: Tue, 25 Mar 2025 15:19:48 GMT
- Title: Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
- Authors: Paul Koch, Jörg Krüger, Ankit Chowdhury, Oliver Heimann,
- Abstract summary: Generalized metric depth understanding is critical for precise vision-guided robotics.<n>We propose Vanishing Depth, a self-supervised training approach that extends pretrained RGB encoders to incorporate and align metric depth into their feature embeddings.<n>We achieve performance improvements and SOTA results across a spectrum of relevant RGBD downstream tasks.
- Score: 0.24999074238880484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose Vanishing Depth, a self-supervised training approach that extends pretrained RGB encoders to incorporate and align metric depth into their feature embeddings. Based on our novel positional depth encoding, we enable stable depth density and depth distribution invariant feature extraction. We achieve performance improvements and SOTA results across a spectrum of relevant RGBD downstream tasks - without the necessity of finetuning the encoder. Most notably, we achieve 56.05 mIoU on SUN-RGBD segmentation, 88.3 RMSE on Void's depth completion, and 83.8 Top 1 accuracy on NYUv2 scene classification. In 6D-object pose estimation, we outperform our predecessors of DinoV2, EVA-02, and Omnivore and achieve SOTA results for non-finetuned encoders in several related RGBD downstream tasks.
Related papers
- Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image [51.689871870692194]
Metric-r is a novel sliding anchor-based metric depth estimation method.
Our design enables a unified and adaptive depth representation across diverse environments.
arXiv Detail & Related papers (2025-04-16T14:12:25Z) - Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation [108.04354143020886]
We introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything.<n>We use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution.
arXiv Detail & Related papers (2024-12-18T16:32:12Z) - DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.<n>We show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models.<n>Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - SDformer: Efficient End-to-End Transformer for Depth Completion [5.864200786548098]
Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor.
Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks.
To overcome the drawbacks of CNNs, a more effective and powerful method has been presented, which is an adaptive self-attention setting sequence-to-sequence model.
arXiv Detail & Related papers (2024-09-12T15:52:08Z) - Depth Matters: Exploring Deep Interactions of RGB-D for Semantic Segmentation in Traffic Scenes [11.446541235218396]
We propose a novel learnable Depth interaction Pyramid Transformer (DiPFormer) to explore the effectiveness of depth.
DiPFormer achieves state-of-the-art performance on the KITTI (97.57% F-score on KITTI road and 68.74% mIoU on KITTI-360) and Cityscapes datasets.
arXiv Detail & Related papers (2024-09-12T12:39:34Z) - OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations [23.0962036039182]
"Optimization-Guided Neural Iterations" (OGNI) is a novel framework for depth completion.
OGNI-DC exhibits strong generalization, outperforming baselines on unseen datasets and across various sparsity levels.
It has high accuracy, achieving state-of-the-art performance on the NYUv2 and the KITTI benchmarks.
arXiv Detail & Related papers (2024-06-17T16:30:29Z) - A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion [10.519644854849098]
We propose a two-step Transformer-based network for indoor depth completion.
Our proposed network achieves the state-of-the-art performance on the Matterport3D dataset.
In addition, to validate the importance of the depth completion task, we apply our methods to indoor 3D reconstruction.
arXiv Detail & Related papers (2024-06-14T07:42:27Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Direct Depth Learning Network for Stereo Matching [79.3665881702387]
A novel Direct Depth Learning Network (DDL-Net) is designed for stereo matching.
DDL-Net consists of two stages: the Coarse Depth Estimation stage and the Adaptive-Grained Depth Refinement stage.
We show that DDL-Net achieves an average improvement of 25% on the SceneFlow dataset and $12%$ on the DrivingStereo dataset.
arXiv Detail & Related papers (2020-12-10T10:33:57Z) - Decoder Modulation for Indoor Depth Completion [2.099922236065961]
Depth completion recovers a dense depth map from sensor measurements.
Current methods are mostly tailored for very sparse depth measurements from LiDARs in outdoor settings.
We propose a new model that takes into account the statistical difference between such regions.
arXiv Detail & Related papers (2020-05-18T11:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.