Related papers: ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment

ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment

URL: http://arxiv.org/abs/2507.19058v1
Date: Fri, 25 Jul 2025 08:21:12 GMT
Title: ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
Authors: Chong Xia, Shengjun Zhang, Fangfu Liu, Chang Liu, Khodchaphun Hirunyaratsameewong, Yueqi Duan,
Abstract summary: ScenePainter is a new framework for semantically consistent 3D scene generation.<n>Our framework overcomes the semantic drift issue and generates more consistent and immersive 3D view sequences.
Score: 13.983092770961514
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Perpetual 3D scene generation aims to produce long-range and coherent 3D view sequences, which is applicable for long-term video synthesis and 3D scene reconstruction. Existing methods follow a "navigate-and-imagine" fashion and rely on outpainting for successive view expansion. However, the generated view sequences suffer from semantic drift issue derived from the accumulated deviation of the outpainting module. To tackle this challenge, we propose ScenePainter, a new framework for semantically consistent 3D scene generation, which aligns the outpainter's scene-specific prior with the comprehension of the current scene. To be specific, we introduce a hierarchical graph structure dubbed SceneConceptGraph to construct relations among multi-level scene concepts, which directs the outpainter for consistent novel views and can be dynamically refined to enhance diversity. Extensive experiments demonstrate that our framework overcomes the semantic drift issue and generates more consistent and immersive 3D view sequences. Project Page: https://xiac20.github.io/ScenePainter/.

Related papers

Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning [63.94919846010485]
3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and semantic cues from multiple input views.<n>We propose a method that measures the visibility uncertainties of 3D points across different input views and uses them to guide 3DGI.<n>We build a novel 3DGI framework, VISTA, by integrating VISibility-uncerTainty-guided 3DGI with scene conceptuAl learning.
arXiv Detail & Related papers (2025-04-23T06:21:11Z)
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration [18.23983135970619]
We propose a novel layered 3D scene reconstruction framework from panoramic image, named Scene4U.<n>Specifically, Scene4U integrates an open-vocabulary segmentation model with a large language model to decompose a real panorama into multiple layers.<n>We then employ a layered repair module based on diffusion model to restore occluded regions using visual cues and depth information, generating a hierarchical representation of the scene.<n>Scene4U outperforms state-of-the-art method, improving by 24.24% in LPIPS and 24.40% in BRISQUE, while also achieving the fastest training speed.
arXiv Detail & Related papers (2025-04-01T03:17:24Z)
Toward Scene Graph and Layout Guided Complex 3D Scene Generation [31.396230860775415]
We present a novel framework of Scene Graph and Layout Guided 3D Scene Generation (GraLa3D)<n>Given a text prompt describing a complex 3D scene, GraLa3D utilizes LLM to model the scene using a scene graph representation with layout bounding box information.<n>GraLa3D uniquely constructs the scene graph with single-object nodes and composite super-nodes.
arXiv Detail & Related papers (2024-12-29T14:21:03Z)
SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation. Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z)
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics.<n>Layerpano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z)
FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting [15.648080938815879]
We propose FastScene, a framework for fast and higher-quality 3D scene generation. FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods.
arXiv Detail & Related papers (2024-05-09T13:44:16Z)
RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting [63.567363455092234]
RefFusion is a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. Our framework achieves state-of-the-art results for object removal while maintaining high controllability.
arXiv Detail & Related papers (2024-04-16T17:50:02Z)
SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z)
SceneScape: Text-Driven Consistent Scene Generation [14.348512536556413]
We introduce a novel framework that generates such videos in an online fashion by combining a pre-trained text-to-image model with a pre-trained monocular depth prediction model. To tackle the pivotal challenge of achieving 3D consistency, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene. In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles.
arXiv Detail & Related papers (2023-02-02T14:47:19Z)
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs [85.54212143154986]
Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Scene graphs are representations of a scene composed of objects (nodes) and inter-object relationships (edges) We propose the first work that directly generates shapes from a scene graph in an end-to-end manner.
arXiv Detail & Related papers (2021-08-19T17:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.