WonderZoom: Multi-Scale 3D World Generation
- URL: http://arxiv.org/abs/2512.09164v1
- Date: Tue, 09 Dec 2025 22:21:07 GMT
- Title: WonderZoom: Multi-Scale 3D World Generation
- Authors: Jin Cao, Hong-Xing Yu, Jiajun Wu,
- Abstract summary: WonderZoom generates 3D scenes with contents across multiple spatial scales from a single image.<n>Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details.
- Score: 24.211362383859406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single-scale synthesis and cannot produce coherent scene contents at varying granularities. The fundamental challenge is the lack of a scale-aware 3D representation capable of generating and rendering content with largely different spatial sizes. WonderZoom addresses this through two key innovations: (1) scale-adaptive Gaussian surfels for generating and real-time rendering of multi-scale 3D scenes, and (2) a progressive detail synthesizer that iteratively generates finer-scale 3D contents. Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details from landscapes to microscopic features. Experiments demonstrate that WonderZoom significantly outperforms state-of-the-art video and 3D models in both quality and alignment, enabling multi-scale 3D world creation from a single image. We show video results and an interactive viewer of generated multi-scale 3D worlds in https://wonderzoom.github.io/
Related papers
- Terra: Explorable Native 3D World Model with Point Latents [74.90179419859415]
We present Terra, a native 3D world model that represents and generates explorable environments in an intrinsic 3D latent space.<n>Specifically, we propose a novel point-to-Gaussian variational autoencoder (P2G-VAE) that encodes 3D inputs into a latent point representation.<n>We then introduce a sparse point flow matching network (SPFlow) for generating the latent point representation, which simultaneously denoises the positions and features of the point latents.
arXiv Detail & Related papers (2025-10-16T17:59:56Z) - Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation [66.95956271144982]
We present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image.<n>Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames.
arXiv Detail & Related papers (2025-06-04T17:59:04Z) - Constructing a 3D Scene from a Single Image [31.11317559252235]
SceneFuse-3D is a training-free framework designed to synthesize coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>This modular design allows us to overcome resolution bottlenecks and preserve spatial structure without requiring 3D supervision or fine-tuning.
arXiv Detail & Related papers (2025-05-21T17:10:47Z) - Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images [5.754780404074765]
We propose representing a complete 3D space for dynamic scene video by modeling explicit representations, specifically 4D Gaussians, from a single image.<n>As far as we know, this is the first attempt that considers animation while representing a complete 3D space from a single landscape image.
arXiv Detail & Related papers (2025-04-04T06:51:39Z) - SynCity: Training-Free Generation of 3D Worlds [107.69875149880679]
We propose SynCity, a training- and optimization-free approach to generating 3D worlds from textual descriptions.<n>We show how 3D and 2D generators can be combined to generate ever-expanding scenes.
arXiv Detail & Related papers (2025-03-20T17:59:40Z) - Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors [11.156009461711639]
Generative Gaussian Splatting (GGS) is a novel approach that integrates a 3D representation with a pre-trained latent video diffusion model.<n>We evaluate our approach on two common benchmark datasets for scene synthesis, RealEstate10K and ScanNet+.
arXiv Detail & Related papers (2025-03-17T15:24:04Z) - Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field [13.815932949774858]
Cinemagraph is a form of visual media that combines elements of still photography and subtle motion to create a captivating experience.
We propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling.
Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation.
arXiv Detail & Related papers (2024-04-13T11:07:53Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - GET3D: A Generative Model of High Quality 3D Textured Shapes Learned
from Images [72.15855070133425]
We introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures.
GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings.
arXiv Detail & Related papers (2022-09-22T17:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.