Related papers: Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision

Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision

URL: http://arxiv.org/abs/2511.10316v1
Date: Fri, 14 Nov 2025 01:45:14 GMT
Title: Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
Authors: Yu Deng, Baozhu Zhao, Junyan Su, Xiaohan Zhang, Qi Liu,
Abstract summary: This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision.<n>By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method.
Score: 12.972772139292957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing methods fail to simultaneously address inaccurate depth estimation in distant areas and structural degradation in close-range regions. This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision to advance 3D Gaussian Splatting. Our approach comprises two core components: (1) Depth-of-field Supervision employs a scale-recovered monocular depth estimator (e.g., Metric3D) to generate depth priors, leverages defocus convolution to synthesize physically accurate defocused images, and enforces geometric consistency through a novel depth-of-field loss, thereby enhancing depth fidelity in both far-field and near-field regions; (2) Multi-View Consistency Supervision employing LoFTR-based semi-dense feature matching to minimize cross-view geometric errors and enforce depth consistency via least squares optimization of reliable matched points. By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method on the Waymo Open Dataset. This framework bridges physical imaging principles and learning-based depth regularization, offering a scalable solution for complex depth stratification in urban environments.

Related papers

GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras [3.072321170197384]
GeoSurDepth is a framework that leverages geometry consistency as the primary cue for surround-view depth estimation.<n>Our framework highlights the importance of exploiting geometry coherence and consistency for robust self-supervised multi-view depth estimation.
arXiv Detail & Related papers (2026-01-09T15:13:28Z)
PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion [61.6340987158734]
We present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth.<n> PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics.<n>We show that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks.
arXiv Detail & Related papers (2025-09-30T09:38:59Z)
Towards High-Precision Depth Sensing via Monocular-Aided iToF and RGB Integration [11.077863605272668]
We present a novel iToF-RGB fusion framework designed to address the inherent limitations of indirect Time-of-Flight (iToF) depth sensing.<n>The proposed method first reprojects the narrow-FoV iToF depth map onto the wide-FoV RGB coordinate system.<n>A dual-encoder fusion network is then employed to jointly extract complementary features from the reprojected iToF depth and RGB image.<n>By integrating cross-modal structural cues and depth consistency constraints, our approach achieves enhanced depth accuracy, improved edge sharpness, and seamless FoV expansion.
arXiv Detail & Related papers (2025-08-03T13:48:00Z)
JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting [10.690965024885358]
Reconstructing 3D scenes from sparse viewpoints is a long-standing challenge with wide applications.<n>Recent advances in feed-forward 3D Gaussian sparse-view reconstruction methods provide an efficient solution for real-time novel view synthesis.<n>We propose JointSplat, a unified framework that leverages the complementarity between optical flow and depth.
arXiv Detail & Related papers (2025-06-04T12:04:40Z)
Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian [49.21866794516328]
3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. Previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting. We introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates.
arXiv Detail & Related papers (2024-05-30T03:18:30Z)
DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation [17.99904937160487]
DCPI-Depth is a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams.<n>It achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts.
arXiv Detail & Related papers (2024-05-27T08:55:17Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z)
Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM. We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline. Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z)
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer. To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.