Related papers: PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting

PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting

URL: http://arxiv.org/abs/2602.00463v1
Date: Sat, 31 Jan 2026 02:34:46 GMT
Title: PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting
Authors: Xin Zhang, Shen Chen, Jiale Zhou, Lei Li,
Abstract summary: We propose PSGS, a framework for high-fidelity panoramic scene generation.<n>First, a novel two-layer optimization architecture generates semantically coherent panoramas.<n>Second, our panorama sliding mechanism initializes globally consistent 3D Gaussian Splatting point clouds.
Score: 18.048020748522312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating realistic 3D scenes from text is crucial for immersive applications like VR, AR, and gaming. While text-driven approaches promise efficiency, existing methods suffer from limited 3D-text data and inconsistent multi-view stitching, resulting in overly simplistic scenes. To address this, we propose PSGS, a two-stage framework for high-fidelity panoramic scene generation. First, a novel two-layer optimization architecture generates semantically coherent panoramas: a layout reasoning layer parses text into structured spatial relationships, while a self-optimization layer refines visual details via iterative MLLM feedback. Second, our panorama sliding mechanism initializes globally consistent 3D Gaussian Splatting point clouds by strategically sampling overlapping perspectives. By incorporating depth and semantic coherence losses during training, we greatly improve the quality and detail fidelity of rendered scenes. Our experiments demonstrate that PSGS outperforms existing methods in panorama generation and produces more appealing 3D scenes, offering a robust solution for scalable immersive content creation.

Related papers

ZeroScene: A Zero-Shot Framework for 3D Scene Generation from a Single Image and Controllable Texture Editing [36.098009720325436]
We propose a novel system to accomplish both single image-to-3D scene reconstruction and texture editing in a zero-shot manner.<n>ZeroScene extracts object-level 2D segmentation and depth information from input images to infer spatial relationships within the scene.<n>It then jointly optimize 3D and 2D projection losses of the point cloud to update object poses for precise scene alignment.
arXiv Detail & Related papers (2025-09-28T03:21:12Z)
TiP4GEN: Text to Immersive Panorama 4D Scene Generation [82.8444414014506]
TiP4GEN is a text-to-dynamic panorama scene generation framework.<n>It enables fine-grained content control and synthesizes motion-rich, geometry-consistent panoramic 4D scenes.<n> TiP4GEN integrates panorama video generation and dynamic scene reconstruction to create 360-degree immersive virtual environments.
arXiv Detail & Related papers (2025-08-17T16:02:24Z)
LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation [5.424048651554831]
We introduce a framework that leverages 3D Gaussian Splatting (3DGS) to facilitate high-quality, physically consistent compositional scene generation guided by text.<n>Specifically, given a text prompt, we convert it into a directed scene graph and adaptively adjust the density and layout of the initial compositional 3D Gaussians.<n>By extracting directed dependencies from the scene graph, we tailor physical and layout energy to ensure both realism and flexibility.
arXiv Detail & Related papers (2025-02-04T02:51:37Z)
BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation [54.12899218104669]
3D scenes have highly complex structures and need to ensure that the output is dense, coherent, and contains all necessary structures.<n>Current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators.<n>We propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation.
arXiv Detail & Related papers (2025-01-15T11:33:34Z)
SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation. Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z)
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics.<n>Layerpano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z)
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions [31.342899807980654]
3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. We introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene. We then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes.
arXiv Detail & Related papers (2024-07-21T14:52:51Z)
DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for generating 3D scenes from text.<n>We use 3D Gaussian Guide that encodes semantic primitives, spatial transformations and relationships from text using LLMs.<n>DreamScape achieves state-of-the-art performance, enabling high-fidelity, controllable 3D scene generation.
arXiv Detail & Related papers (2024-04-14T12:13:07Z)
Text2Immersion: Generative Immersive Scene with 3D Gaussians [14.014016090679627]
Text2Immersion is an elegant method for producing high-quality 3D immersive scenes from text prompts. Our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation.
arXiv Detail & Related papers (2023-12-14T18:58:47Z)
SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z)
CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts. We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area. Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z)
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization [66.25948693095604]
We propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image. Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement.
arXiv Detail & Related papers (2021-08-24T13:55:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.