Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting
- URL: http://arxiv.org/abs/2512.17908v1
- Date: Fri, 19 Dec 2025 18:59:56 GMT
- Title: Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting
- Authors: Ananta R. Bhattarai, Helge Rhodin,
- Abstract summary: We introduce Re-Depth Anything, a test-time self-supervision framework.<n>We fuse DA-V2 with the powerful priors of large-scale 2D diffusion models.<n>Our method performs label-free refinement directly on the input image.
- Score: 14.53203375098443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular depth estimation remains challenging as recent foundation models, such as Depth Anything V2 (DA-V2), struggle with real-world images that are far from the training distribution. We introduce Re-Depth Anything, a test-time self-supervision framework that bridges this domain gap by fusing DA-V2 with the powerful priors of large-scale 2D diffusion models. Our method performs label-free refinement directly on the input image by re-lighting predicted depth maps and augmenting the input. This re-synthesis method replaces classical photometric reconstruction by leveraging shape from shading (SfS) cues in a new, generative context with Score Distillation Sampling (SDS). To prevent optimization collapse, our framework employs a targeted optimization strategy: rather than optimizing depth directly or fine-tuning the full model, we freeze the encoder and only update intermediate embeddings while also fine-tuning the decoder. Across diverse benchmarks, Re-Depth Anything yields substantial gains in depth accuracy and realism over the DA-V2, showcasing new avenues for self-supervision by augmenting geometric reasoning.
Related papers
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z) - Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.<n>Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.<n>We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler [62.06785782635153]
We propose a new model, UniDepthV2, capable of reconstructing metric 3D scenes from solely single images across domains.<n>UniDepthV2 directly predicts metric 3D points from the input image at inference time without any additional information.<n>Our model exploits a pseudo-spherical output representation, which disentangles the camera and depth representations.
arXiv Detail & Related papers (2025-02-27T14:03:15Z) - A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation [46.037640130193566]
We propose a new method to rescale Depth Anything predictions using 3D points provided by sensors or techniques such as low-resolution LiDAR or structure-from-motion with poses given by an IMU.<n>Our experiments highlight enhancements relative to zero-shot monocular metric depth estimation methods, competitive results compared to fine-tuned approaches and a better robustness than depth completion approaches.
arXiv Detail & Related papers (2024-12-18T17:50:15Z) - Mask-adaptive Gated Convolution and Bi-directional Progressive Fusion Network for Depth Completion [3.5940515868907164]
We propose a new model for depth completion based on an encoder-decoder structure.<n>Our model introduces two key components: the Mask-adaptive Gated Convolution architecture and the Bi-directional Progressive Fusion module.<n>We achieve remarkable performance in completing depth maps and outperformed existing approaches in terms of accuracy and reliability.
arXiv Detail & Related papers (2024-01-15T02:58:06Z) - DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks.
DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content.
In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results.
For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - FG-Depth: Flow-Guided Unsupervised Monocular Depth Estimation [17.572459787107427]
We propose a flow distillation loss to replace the typical photometric loss and a prior flow based mask to remove invalid pixels.
Our approach achieves state-of-the-art results on both KITTI and NYU-Depth-v2 datasets.
arXiv Detail & Related papers (2023-01-20T04:02:13Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative
Depth Prediction [4.9188958016378495]
We propose a dense monocular SLAM system, named DeepFusion, that is capable of recovering a globally consistent 3D structure.
We use a visual SLAM to reliably recover the camera poses and semi-dense maps of depth thes, and then use relative depth prediction to densify the semi-dense depth maps and refine the pose-graph.
Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin.
arXiv Detail & Related papers (2020-06-07T05:22:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.