Self-Evolving 3D Scene Generation from a Single Image
- URL: http://arxiv.org/abs/2512.08905v1
- Date: Tue, 09 Dec 2025 18:44:21 GMT
- Title: Self-Evolving 3D Scene Generation from a Single Image
- Authors: Kaizhi Zheng, Yue Fan, Jing Gu, Zishuo Xu, Xuehai He, Xin Eric Wang,
- Abstract summary: EvoScene is a training-free framework that progressively reconstructs complete 3D scenes from single images.<n>EvoScene alternates between 2D and 3D domains, gradually improving both structure and appearance.
- Score: 44.87957263540352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover reasonable geometry from single views, but their object-centric training limits generalization to complex, large-scale scenes with faithful structure and texture. We present EvoScene, a self-evolving, training-free framework that progressively reconstructs complete 3D scenes from single images. The key idea is combining the complementary strengths of existing models: geometric reasoning from 3D generation models and visual knowledge from video generation models. Through three iterative stages--Spatial Prior Initialization, Visual-guided 3D Scene Mesh Generation, and Spatial-guided Novel View Generation--EvoScene alternates between 2D and 3D domains, gradually improving both structure and appearance. Experiments on diverse scenes demonstrate that EvoScene achieves superior geometric stability, view-consistent textures, and unseen-region completion compared to strong baselines, producing ready-to-use 3D meshes for practical applications.
Related papers
- Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement [12.855027334688382]
Photo3D is a framework for advancing 3D generation driven by the GPT-4o-Image model image data.<n>We present a realistic detail enhancement scheme that leverages perceptual feature adaptation and semantic structure matching to enforce appearance consistency.<n>Our scheme is general to different 3D-native generators, and we present dedicated training strategies to facilitate the optimization of geometry-texture coupled and decoupled 3D-native generation paradigms.
arXiv Detail & Related papers (2025-12-09T12:33:48Z) - WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z) - Constructing a 3D Scene from a Single Image [31.11317559252235]
SceneFuse-3D is a training-free framework designed to synthesize coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>This modular design allows us to overcome resolution bottlenecks and preserve spatial structure without requiring 3D supervision or fine-tuning.
arXiv Detail & Related papers (2025-05-21T17:10:47Z) - Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting [75.7154104065613]
We introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process.
We also introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry.
arXiv Detail & Related papers (2024-04-30T17:59:40Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture [47.44029968307207]
We propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images.
Our approach utilizes the proposed Single-view neural implicit Shape and Radiance field (SSR) representations to leverage both explicit 3D shape supervision and volume rendering.
A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model.
arXiv Detail & Related papers (2023-11-01T11:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.