Related papers: UrbanCraft: Urban View Extrapolation via Hierarchical Sem-Geometric Priors

UrbanCraft: Urban View Extrapolation via Hierarchical Sem-Geometric Priors

URL: http://arxiv.org/abs/2505.23434v1
Date: Thu, 29 May 2025 13:28:04 GMT
Title: UrbanCraft: Urban View Extrapolation via Hierarchical Sem-Geometric Priors
Authors: Tianhang Wang, Fan Lu, Sanqing Qu, Guo Yu, Shihang Du, Ya Wu, Yuan Huang, Guang Chen,
Abstract summary: Urban scene reconstruction methods mainly focus on the Interpolated View Synthesis setting that synthesizes views close to training camera trajectory.<n>Previous methods have optimized it via image diffusion, but they fail to handle text-ambiguous or large unseen view angles.<n>We design UrbanCraft, which surmounts the Extrapolated View Synthesis problem using hierarchical sem-geometric representations serving as additional priors.
Score: 10.706273062956507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing neural rendering-based urban scene reconstruction methods mainly focus on the Interpolated View Synthesis (IVS) setting that synthesizes from views close to training camera trajectory. However, IVS can not guarantee the on-par performance of the novel view outside the training camera distribution (\textit{e.g.}, looking left, right, or downwards), which limits the generalizability of the urban reconstruction application. Previous methods have optimized it via image diffusion, but they fail to handle text-ambiguous or large unseen view angles due to coarse-grained control of text-only diffusion. In this paper, we design UrbanCraft, which surmounts the Extrapolated View Synthesis (EVS) problem using hierarchical sem-geometric representations serving as additional priors. Specifically, we leverage the partially observable scene to reconstruct coarse semantic and geometric primitives, establishing a coarse scene-level prior through an occupancy grid as the base representation. Additionally, we incorporate fine instance-level priors from 3D bounding boxes to enhance object-level details and spatial relationships. Building on this, we propose the \textbf{H}ierarchical \textbf{S}emantic-Geometric-\textbf{G}uided Variational Score Distillation (HSG-VSD), which integrates semantic and geometric constraints from pretrained UrbanCraft2D into the score distillation sampling process, forcing the distribution to be consistent with the observable scene. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS problem.

Related papers

Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z)
HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering [49.64902130083662]
We present Hierarchical vocab-Agnostic Expert Clustering (HAEC), after the latin word for 'these'<n>We administer this highly scalable approach to the first application of open-vocabulary scene understanding on the SensatUrban city-scale dataset.<n>Our technique can help unlock complex operations on dense urban 3D scenes and open a new path forward in the processing of digital twins.
arXiv Detail & Related papers (2025-04-18T09:48:42Z)
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.<n>Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.<n>We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z)
See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization [14.239772421978373]
3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis.<n>However, its rendering quality deteriorates with sparse inphut views, leading to distorted content and reduced details.<n>We propose a sparse-view 3DGS method, incorporating prior information is crucial.<n>Our method outperforms state-of-the-art novel view synthesis approaches, achieving up to 0.4dB improvement in terms of PSNR on the LLFF dataset.
arXiv Detail & Related papers (2025-01-20T14:30:38Z)
How to Use Diffusion Priors under Sparse Views? [29.738350228085928]
Inline Prior Guided Score Matching is proposed to provide visual supervision over sparse views in 3D reconstruction.<n>We show that our method achieves state-of-the-art reconstruction quality.
arXiv Detail & Related papers (2024-12-03T07:31:54Z)
Efficient Depth-Guided Urban View Synthesis [52.841803876653465]
We introduce Efficient Depth-Guided Urban View Synthesis (EDUS) for fast feed-forward inference and efficient per-scene fine-tuning. EDUS exploits noisy predicted geometric priors as guidance to enable generalizable urban view synthesis from sparse input images. Our results indicate that EDUS achieves state-of-the-art performance in sparse view settings when combined with fast test-time optimization.
arXiv Detail & Related papers (2024-07-17T08:16:25Z)
VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors [32.02774117064752]
We tackle the Extrapolated View Synthesis (EVS) problem by evaluating the reconstructions on views such as looking left, right or downwards. To the best of our knowledge, we are the first to address the EVS problem in urban scene reconstruction.
arXiv Detail & Related papers (2024-07-03T09:23:13Z)
SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene. SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z)
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning [119.99066522299309]
KYN is a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation. We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work.
arXiv Detail & Related papers (2024-04-04T17:59:59Z)
AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation. This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z)
Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.