DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation
- URL: http://arxiv.org/abs/2508.13091v1
- Date: Mon, 18 Aug 2025 17:05:15 GMT
- Title: DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation
- Authors: Zihua Liu, Yizhou Li, Songyan Zhang, Masatoshi Okutomi,
- Abstract summary: We propose a model-agnostic approach that synthesizes novel views along the epipolar direction, guided by directional prompts.<n>Our proposed DMS is a cost-free, ''plug-and-play'' method that seamlessly enhances self-supervised stereo matching and monocular depth estimation.
- Score: 10.461837853869959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While supervised stereo matching and monocular depth estimation have advanced significantly with learning-based algorithms, self-supervised methods using stereo images as supervision signals have received relatively less focus and require further investigation. A primary challenge arises from ambiguity introduced during photometric reconstruction, particularly due to missing corresponding pixels in ill-posed regions of the target view, such as occlusions and out-of-frame areas. To address this and establish explicit photometric correspondences, we propose DMS, a model-agnostic approach that utilizes geometric priors from diffusion models to synthesize novel views along the epipolar direction, guided by directional prompts. Specifically, we finetune a Stable Diffusion model to simulate perspectives at key positions: left-left view shifted from the left camera, right-right view shifted from the right camera, along with an additional novel view between the left and right cameras. These synthesized views supplement occluded pixels, enabling explicit photometric reconstruction. Our proposed DMS is a cost-free, ''plug-and-play'' method that seamlessly enhances self-supervised stereo matching and monocular depth estimation, and relies solely on unlabeled stereo image pairs for both training and synthesizing. Extensive experiments demonstrate the effectiveness of our approach, with up to 35% outlier reduction and state-of-the-art performance across multiple benchmark datasets.
Related papers
- Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [62.37493746544967]
Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps.<n>Existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments.<n>We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation.
arXiv Detail & Related papers (2025-03-30T16:24:22Z) - MODEL&CO: Exoplanet detection in angular differential imaging by learning across multiple observations [37.845442465099396]
Most post-processing methods build a model of the nuisances from the target observations themselves.
We propose to build the nuisance model from an archive of multiple observations by leveraging supervised deep learning techniques.
We apply the proposed algorithm to several datasets from the VLT/SPHERE instrument, and demonstrate a superior precision-recall trade-off.
arXiv Detail & Related papers (2024-09-23T09:22:45Z) - SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing [12.506628755166814]
We re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS)
Our SDL-MVS aims to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation.
Experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance.
arXiv Detail & Related papers (2024-05-27T12:59:46Z) - RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction [3.1820300989695833]
This paper introduces a versatile paradigm for integrating multi-view reflectance and normal maps acquired through photometric stereo.
Our approach employs a pixel-wise joint re- parameterization of reflectance and normal, considering them as a vector of radiances rendered under simulated, varying illumination.
It significantly improves the detailed 3D reconstruction of areas with high curvature or low visibility.
arXiv Detail & Related papers (2023-12-02T19:49:27Z) - Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera.
The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z) - ChiTransformer:Towards Reliable Stereo from Cues [10.756828396434033]
Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size.
We present an optic-chiasm-inspired self-supervised binocular depth estimation method.
ChiTransformer architecture yields substantial improvements over state-of-the-art self-supervised stereo approaches by 11%.
arXiv Detail & Related papers (2022-03-09T07:19:58Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics [72.9038524082252]
We propose a compact single-shot monocular hyperspectral-depth (HS-D) imaging method.
Our method uses a diffractive optical element (DOE), the point spread function of which changes with respect to both depth and spectrum.
To facilitate learning the DOE, we present a first HS-D dataset by building a benchtop HS-D imager.
arXiv Detail & Related papers (2020-09-01T14:19:35Z) - D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation.
We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision.
We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z) - Multi-View Photometric Stereo: A Robust Solution and Benchmark Dataset
for Spatially Varying Isotropic Materials [65.95928593628128]
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique.
Our algorithm is suitable for perspective cameras and nearby point light sources.
arXiv Detail & Related papers (2020-01-18T12:26:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.