Related papers: Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

URL: http://arxiv.org/abs/2512.16913v1
Date: Thu, 18 Dec 2025 18:59:29 GMT
Title: Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
Authors: Xin Lin, Meixi Song, Dizhe Zhang, Wenxuan Lu, Haodong Li, Bo Du, Ming-Hsuan Yang, Truong Nguyen, Lu Qi,
Abstract summary: We present a panoramic metric depth foundation model that generalizes across diverse scene distances.<n>We collect a large-scale dataset by combining public datasets, high-quality synthetic data from our UE5 simulator and text-to-image models, and real panoramic images from the web.
Score: 68.95366581365829
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present a panoramic metric depth foundation model that generalizes across diverse scene distances. We explore a data-in-the-loop paradigm from the view of both data construction and framework design. We collect a large-scale dataset by combining public datasets, high-quality synthetic data from our UE5 simulator and text-to-image models, and real panoramic images from the web. To reduce domain gaps between indoor/outdoor and synthetic/real data, we introduce a three-stage pseudo-label curation pipeline to generate reliable ground truth for unlabeled images. For the model, we adopt DINOv3-Large as the backbone for its strong pre-trained generalization, and introduce a plug-and-play range mask head, sharpness-centric optimization, and geometry-centric optimization to improve robustness to varying distances and enforce geometric consistency across views. Experiments on multiple benchmarks (e.g., Stanford2D3D, Matterport3D, and Deep360) demonstrate strong performance and zero-shot generalization, with particularly robust and stable metric predictions in diverse real-world scenes. The project page can be found at: \href{https://insta360-research-team.github.io/DAP_website/} {https://insta360-research-team.github.io/DAP\_website/}

Related papers

TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction [57.46712611558817]
3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass.<n>Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry.<n>We propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies.
arXiv Detail & Related papers (2025-12-02T02:22:20Z)
WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting [51.69408870574092]
We present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks.<n>Our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps.<n>WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis.
arXiv Detail & Related papers (2025-10-12T17:59:09Z)
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency [63.74016242995453]
PacGDC is a label-efficient technique that enhances data diversity with minimal annotation effort for generalizable depth completion.<n>We propose a new data synthesis pipeline that uses multiple depth foundation models as scale manipulators.<n>Experiments show that PacGDC achieves remarkable generalizability across multiple benchmarks.
arXiv Detail & Related papers (2025-07-10T01:56:30Z)
E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models [78.1674905950243]
We present the first comprehensive benchmark for 3D geometric foundation models (GFMs)<n>GFMs directly predict dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters.<n>We evaluate 16 state-of-the-art GFMs, revealing their strengths and limitations across tasks and domains.<n>All code, evaluation scripts, and processed data will be publicly released to accelerate research in 3D spatial intelligence.
arXiv Detail & Related papers (2025-06-02T17:53:09Z)
Multi-View Projection for Unsupervised Domain Adaptation in 3D Semantic Segmentation [0.9345376836714131]
We propose a multi-view projection framework for unsupervised domain adaptation (UDA)<n>Our method aligns LiDAR scans into coherent 3D scenes and renders them from multiple virtual camera poses to generate 2D datasets (PC2D)<n>An ensemble of 2D segmentation models is trained on these modalities, and during inference, hundreds of views per scene are processed, with logits back-projected to 3D.<n>We show that our framework enables segmentation of rare classes, leveraging only 2D annotations for those classes while relying on 3D annotations for others in the source domain.
arXiv Detail & Related papers (2025-05-21T14:08:42Z)
Calibrating Panoramic Depth Estimation for Practical Localization and Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation. We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z)
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments. TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.