SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
- URL: http://arxiv.org/abs/2508.15769v1
- Date: Thu, 21 Aug 2025 17:59:16 GMT
- Title: SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
- Authors: Yanxu Meng, Haoning Wu, Ya Zhang, Weidi Xie,
- Abstract summary: 3D content generation has attracted significant research interest due to its applications in VR/AR and embodied AI.<n>We present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets.<n>We believe this paradigm offers a novel solution for high-quality 3D content generation, potentially advancing its practical applications in downstream tasks.
- Score: 44.087747627571716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D content generation has recently attracted significant research interest due to its applications in VR/AR and embodied AI. In this work, we address the challenging task of synthesizing multiple 3D assets within a single scene image. Concretely, our contributions are fourfold: (i) we present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets with geometry and texture. Notably, SceneGen operates with no need for optimization or asset retrieval; (ii) we introduce a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders within the feature extraction module. Coupled with a position head, this enables the generation of 3D assets and their relative spatial positions in a single feedforward pass; (iii) we demonstrate SceneGen's direct extensibility to multi-image input scenarios. Despite being trained solely on single-image inputs, our architectural design enables improved generation performance with multi-image inputs; and (iv) extensive quantitative and qualitative evaluations confirm the efficiency and robust generation abilities of our approach. We believe this paradigm offers a novel solution for high-quality 3D content generation, potentially advancing its practical applications in downstream tasks. The code and model will be publicly available at: https://mengmouxu.github.io/SceneGen.
Related papers
- GEN3D: Generating Domain-Free 3D Scenes from a Single Image [13.128331445276764]
Gen3d is a novel method for generation of high-quality, wide-scope, and generic 3D scenes from a single image.<n>Experiments on diverse datasets demonstrate the strong generalization capability and superior performance of our method.
arXiv Detail & Related papers (2025-11-18T09:40:43Z) - Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation [27.13700598039439]
This paper presents a novel vision-guided 3D layout generation system.<n>We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts.<n>We then employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library.<n>We optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images.
arXiv Detail & Related papers (2025-10-17T11:48:08Z) - Towards Geometric and Textural Consistency 3D Scene Generation via Single Image-guided Model Generation and Layout Optimization [14.673302810271219]
We propose a novel three-stage framework for 3D scene generation with explicit geometric representations and high-quality textural details.<n>Our approach not only outperforms state-of-the-art methods in terms of geometric accuracy and texture fidelity of individual generated 3D models, but also has significant advantages in scene layout synthesis.
arXiv Detail & Related papers (2025-07-20T06:59:42Z) - Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - Constructing a 3D Scene from a Single Image [31.11317559252235]
SceneFuse-3D is a training-free framework designed to synthesize coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>This modular design allows us to overcome resolution bottlenecks and preserve spatial structure without requiring 3D supervision or fine-tuning.
arXiv Detail & Related papers (2025-05-21T17:10:47Z) - HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z) - Interactive Scene Authoring with Specialized Generative Primitives [25.378818867764323]
Specialized Generative Primitives is a generative framework that allows non-expert users to author high-quality 3D scenes.<n>Each primitive is an efficient generative model that captures the distribution of a single exemplar from the real world.<n>We showcase interactive sessions where various primitives are extracted from real-world scenes and controlled to create 3D assets and scenes in a few minutes.
arXiv Detail & Related papers (2024-12-20T04:39:50Z) - StdGEN: Semantic-Decomposed 3D Character Generation from Single Images [28.302030751098354]
StdGEN is an innovative pipeline for generating semantically high-quality 3D characters from single images.<n>It generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes.<n>StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications.
arXiv Detail & Related papers (2024-11-08T17:54:18Z) - InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models [66.83681825842135]
InstantMesh is a feed-forward framework for instant 3D mesh generation from a single image.
It features state-of-the-art generation quality and significant training scalability.
We release all the code, weights, and demo of InstantMesh with the intention that it can make substantial contributions to the community of 3D generative AI.
arXiv Detail & Related papers (2024-04-10T17:48:37Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.