DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
- URL: http://arxiv.org/abs/2503.04171v2
- Date: Wed, 20 Aug 2025 08:25:05 GMT
- Title: DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
- Authors: Zhiqiang Yan, Zhengxue Wang, Haoye Dong, Jun Li, Jian Yang, Gim Hee Lee,
- Abstract summary: We introduce DuCos, a novel depth super-resolution framework grounded in Lagrangian duality theory.<n>DuCos is the first to significantly improve generalization across diverse scenarios with foundation models as prompts.
- Score: 56.88399488384106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce DuCos, a novel depth super-resolution framework grounded in Lagrangian duality theory, offering a flexible integration of multiple constraints and reconstruction objectives to enhance accuracy and robustness. Our DuCos is the first to significantly improve generalization across diverse scenarios with foundation models as prompts. The prompt design consists of two key components: Correlative Fusion (CF) and Gradient Regulation (GR). CF facilitates precise geometric alignment and effective fusion between prompt and depth features, while GR refines depth predictions by enforcing consistency with sharp-edged depth maps derived from foundation models. Crucially, these prompts are seamlessly embedded into the Lagrangian constraint term, forming a synergistic and principled framework. Extensive experiments demonstrate that DuCos outperforms existing state-of-the-art methods, achieving superior accuracy, robustness, and generalization.
Related papers
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z) - FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM [50.9765003472032]
FoundationSLAM is a learning-based monocular dense SLAM system for accurate and robust tracking and mapping.<n>Our core idea is to bridge flow estimation with reasoning by leveraging the guidance from foundation depth models.
arXiv Detail & Related papers (2025-12-31T17:57:45Z) - HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning [66.99487505369254]
HiCoGen is built upon a novel Chain of Synthesis paradigm.<n>It decomposes complex prompts into minimal semantic units.<n>It then synthesizes these units iteratively, where the image generated in each step provides crucial visual context for the next.<n>Experiments show our approach significantly outperforms existing methods in both concept coverage and compositional accuracy.
arXiv Detail & Related papers (2025-11-25T06:24:25Z) - Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision [12.972772139292957]
This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision.<n>By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method.
arXiv Detail & Related papers (2025-11-13T13:51:16Z) - Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion [33.854696587141355]
We propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training.<n>Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions.<n>Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods.
arXiv Detail & Related papers (2025-08-07T02:38:24Z) - Towards High-Precision Depth Sensing via Monocular-Aided iToF and RGB Integration [11.077863605272668]
We present a novel iToF-RGB fusion framework designed to address the inherent limitations of indirect Time-of-Flight (iToF) depth sensing.<n>The proposed method first reprojects the narrow-FoV iToF depth map onto the wide-FoV RGB coordinate system.<n>A dual-encoder fusion network is then employed to jointly extract complementary features from the reprojected iToF depth and RGB image.<n>By integrating cross-modal structural cues and depth consistency constraints, our approach achieves enhanced depth accuracy, improved edge sharpness, and seamless FoV expansion.
arXiv Detail & Related papers (2025-08-03T13:48:00Z) - Hyperbolic Deep Learning for Foundation Models: A Survey [16.14776172953206]
Foundation models pre-trained on massive datasets have demonstrated remarkable success in diverse downstream tasks.<n>Recent advances have leveraged hyperbolic neural networks to enhance foundation models.<n>This paper provides a comprehensive review of hyperbolic neural networks and their recent development for foundation models.
arXiv Detail & Related papers (2025-07-23T09:50:17Z) - Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs [29.224895952158274]
We introduce a novel framework that seamlessly merges the strengths of both paradigms.<n>We show that our approach consistently outperforms state-of-the-art baselines.<n>This fusion of black-box initialization with advanced semantic refinement yields a scalable and efficient solution.
arXiv Detail & Related papers (2025-06-14T14:27:54Z) - Perfecting Depth: Uncertainty-Aware Enhancement of Metric Depth [33.61994004497114]
We propose a novel two-stage framework for sensor depth enhancement, called Perfecting Depth.<n>This framework leverages the nature of diffusion models to automatically detect unreliable depth regions while preserving geometric cues.<n>Our framework sets a new baseline for sensor depth enhancement, with potential applications in autonomous driving, robotics, and immersive technologies.
arXiv Detail & Related papers (2025-06-05T04:09:11Z) - Depth Anything with Any Prior [64.39991799606146]
Prior Depth Anything is a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction.<n>We develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors.<n>Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets.
arXiv Detail & Related papers (2025-05-15T17:59:50Z) - A Fusion-Guided Inception Network for Hyperspectral Image Super-Resolution [4.487807378174191]
We propose a single-image super-resolution model called the Fusion-Guided Inception Network (FGIN)<n>Specifically, we first employ a spectral-spatial fusion module to effectively integrate spectral and spatial information.<n>An Inception-like hierarchical feature extraction strategy is used to capture multiscale spatial dependencies.<n>To further enhance reconstruction quality, we incorporate an optimized upsampling module that combines bilinear with depthwise separable convolutions.
arXiv Detail & Related papers (2025-05-06T11:15:59Z) - Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures and occlusions.
Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.
We propose a novel framework that attempts to precisely align hand poses and interactions by integrating foundation model-driven 2D priors with diffusion-based interaction refinement.
arXiv Detail & Related papers (2025-03-22T14:42:27Z) - Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.<n>We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems [3.3969056208620128]
We propose to push the boundary of inference steps to 1-2 NFEs while still maintaining high reconstruction quality.
Our method achieves new state-of-the-art in diffusion-based inverse problem solving.
arXiv Detail & Related papers (2024-07-17T15:57:50Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - PatchFusion: An End-to-End Tile-Based Framework for High-Resolution
Monocular Metric Depth Estimation [47.53810786827547]
Single image depth estimation is a foundational task in computer vision and generative modeling.
We present PatchFusion, a novel tile-based framework with three key components to improve the current state of the art.
Experiments on UnrealStereo4K, MVS- Synth, and Middleburry 2014 demonstrate that our framework can generate high-resolution depth maps with intricate details.
arXiv Detail & Related papers (2023-12-04T19:03:12Z) - Deep Physics-Guided Unrolling Generalization for Compressed Sensing [8.780025933849751]
Deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction.
We find the intrinsic defect of this emerging paradigm, widely implemented by deep algorithm-unrolled networks.
A novel deep $textbfP$hysics-guided untextbfR$olled recovery is proposed.
arXiv Detail & Related papers (2023-07-18T03:37:10Z) - DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view
Structure from Motion [9.294501649791016]
Two-view structure from motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM (vSLAM)
We formulate the two-view SfM problem as a maximum likelihood estimation (MLE) and solve it with the proposed framework, denoted as DeepMLE.
Our method significantly outperforms the state-of-the-art end-to-end two-view SfM approaches in accuracy and generalization capability.
arXiv Detail & Related papers (2022-10-11T15:07:25Z) - High-resolution Face Swapping via Latent Semantics Disentanglement [50.23624681222619]
We present a novel high-resolution hallucination face swapping method using the inherent prior knowledge of a pre-trained GAN model.
We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator.
We extend our method to video face swapping by enforcing two-temporal constraints on the latent space and the image space.
arXiv Detail & Related papers (2022-03-30T00:33:08Z) - Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses [67.01164492518481]
This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses.
We propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input.
Our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.
arXiv Detail & Related papers (2021-02-14T06:44:47Z) - HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation [14.81943833870932]
We present an improvedDepthNet, HR-Depth, with two effective strategies.
Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution.
arXiv Detail & Related papers (2020-12-14T09:15:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.