From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
- URL: http://arxiv.org/abs/2512.07527v2
- Date: Tue, 09 Dec 2025 06:52:41 GMT
- Title: From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
- Authors: Fei Yu, Yu Liu, Luyang Tang, Mingchao Sun, Zengye Ge, Rui Bu, Yuchao Jin, Haisen Zhao, He Sun, Yangyan Li, Mu Xu, Wenzheng Chen, Baoquan Chen,
- Abstract summary: City-scale 3D reconstruction from satellite imagery presents the challenge of extreme viewpoint extrapolation.<n>This requires inferring nearly $90circ$ viewpoint gaps from image sources.<n>We propose two design choices tailored for city structures and satellite inputs.
- Score: 31.421617684580834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: City-scale 3D reconstruction from satellite imagery presents the challenge of extreme viewpoint extrapolation, where our goal is to synthesize ground-level novel views from sparse orbital images with minimal parallax. This requires inferring nearly $90^\circ$ viewpoint gaps from image sources with severely foreshortened facades and flawed textures, causing state-of-the-art reconstruction engines such as NeRF and 3DGS to fail. To address this problem, we propose two design choices tailored for city structures and satellite inputs. First, we model city geometry as a 2.5D height map, implemented as a Z-monotonic signed distance field (SDF) that matches urban building layouts from top-down viewpoints. This stabilizes geometry optimization under sparse, off-nadir satellite views and yields a watertight mesh with crisp roofs and clean, vertically extruded facades. Second, we paint the mesh appearance from satellite images via differentiable rendering techniques. While the satellite inputs may contain long-range, blurry captures, we further train a generative texture restoration network to enhance the appearance, recovering high-frequency, plausible texture details from degraded inputs. Our method's scalability and robustness are demonstrated through extensive experiments on large-scale urban reconstruction. For example, in our teaser figure, we reconstruct a $4\,\mathrm{km}^2$ real-world region from only a few satellite images, achieving state-of-the-art performance in synthesizing photorealistic ground views. The resulting models are not only visually compelling but also serve as high-fidelity, application-ready assets for downstream tasks like urban planning and simulation. Project page can be found at https://pku-vcl-geometry.github.io/Orbit2Ground/.
Related papers
- Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery [13.938311471105303]
We propose textbfSkyfall-GS, the first city-block scale 3D scene creation framework without costly 3D annotations.<n>We tailor a curriculum-driven iterative refinement strategy to progressively enhance geometric and photorealistic textures.
arXiv Detail & Related papers (2025-10-17T17:59:51Z) - Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion [18.943643720564996]
Sat2City is a novel framework that synergizes the representational capacity of sparse voxel grids with latent diffusion models.<n>We introduce a dataset of synthesized large-scale 3D cities paired with satellite-view height maps.<n>Our framework generates detailed 3D structures from a single satellite image, achieving superior fidelity compared to existing city generation models.
arXiv Detail & Related papers (2025-07-06T14:30:08Z) - CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians [48.22146326647521]
CityGo is a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for rendering of urban scenes from aerial perspectives.<n>We show that our representation significantly reduces training time, achieving on average 1.4x speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches.
arXiv Detail & Related papers (2025-05-27T11:24:08Z) - AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis [57.249817395828174]
We propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes with real, ground-level crowd-sourced images.<n>The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images.<n>Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks.
arXiv Detail & Related papers (2025-04-17T17:57:05Z) - SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation [70.76810765911499]
We introduce SuperCarver, a 3D geometry super-resolution pipeline for supplementing texture-consistent surface details onto a given coarse mesh.<n> Experiments demonstrate that our SuperCarver is capable of generating realistic and expressive surface details depicted by the actual texture appearance.
arXiv Detail & Related papers (2025-03-12T14:38:45Z) - Skyeyes: Ground Roaming using Aerial View Images [9.159470619808127]
We introduce Skyeyes, a novel framework that can generate sequences of ground view images using only aerial view inputs.
More specifically, we combine a 3D representation with a view consistent generation model, which ensures coherence between generated images.
The images maintain improved spatial-temporal coherence and realism, enhancing scene comprehension and visualization from aerial perspectives.
arXiv Detail & Related papers (2024-09-25T07:21:43Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - ImpliCity: City Modeling from Satellite Images with Deep Implicit
Occupancy Fields [20.00737387884824]
ImpliCity is a neural representation of the 3D scene as an implicit, continuous occupancy field, driven by learned embeddings of the point cloud and a stereo pair of ortho-photos.
With image resolution 0.5$,$m, ImpliCity reaches a median height error of $approx,$0.7$,$m and outperforms competing methods.
arXiv Detail & Related papers (2022-01-24T21:40:16Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z) - OSTeC: One-Shot Texture Completion [86.23018402732748]
We propose an unsupervised approach for one-shot 3D facial texture completion.
The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator.
We frontalize the target image by projecting the completed texture into the generator.
arXiv Detail & Related papers (2020-12-30T23:53:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.