Causal Reasoning Elicits Controllable 3D Scene Generation
- URL: http://arxiv.org/abs/2509.15249v1
- Date: Thu, 18 Sep 2025 01:03:21 GMT
- Title: Causal Reasoning Elicits Controllable 3D Scene Generation
- Authors: Shen Chen, Ruiyu Zhao, Jiale Zhou, Zongkai Wu, Jenq-Neng Hwang, Lei Li,
- Abstract summary: CausalStruct is a novel framework that embeds causal reasoning into 3D scene generation.<n>We construct causal graphs where nodes represent objects and attributes, while edges encode causal dependencies and physical constraints.<n>Our method uses text or images to guide object placement and layout in 3D scenes, with 3D Gaussian Splatting and Score Distillation Sampling improving shape accuracy and rendering stability.
- Score: 35.22855710229319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing 3D scene generation methods often struggle to model the complex logical dependencies and physical constraints between objects, limiting their ability to adapt to dynamic and realistic environments. We propose CausalStruct, a novel framework that embeds causal reasoning into 3D scene generation. Utilizing large language models (LLMs), We construct causal graphs where nodes represent objects and attributes, while edges encode causal dependencies and physical constraints. CausalStruct iteratively refines the scene layout by enforcing causal order to determine the placement order of objects and applies causal intervention to adjust the spatial configuration according to physics-driven constraints, ensuring consistency with textual descriptions and real-world dynamics. The refined scene causal graph informs subsequent optimization steps, employing a Proportional-Integral-Derivative(PID) controller to iteratively tune object scales and positions. Our method uses text or images to guide object placement and layout in 3D scenes, with 3D Gaussian Splatting and Score Distillation Sampling improving shape accuracy and rendering stability. Extensive experiments show that CausalStruct generates 3D scenes with enhanced logical coherence, realistic spatial interactions, and robust adaptability.
Related papers
- PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement [89.35154754765502]
PhyScensis is an agent-based framework powered by a physics engine to produce physically plausible scene configurations.<n>Our framework preserves strong controllability over fine-grained textual descriptions and numerical parameters.<n> Experimental results show that our method outperforms prior approaches in scene complexity, visual quality, and physical accuracy.
arXiv Detail & Related papers (2026-02-16T17:55:25Z) - SceneLinker: Compositional 3D Scene Generation via Semantic Scene Graph from RGB Sequences [12.771171646896468]
We introduce SceneLinker, a framework that generates compositional 3D scenes via semantic scene graph from RGB sequences.<n>Our work enables users to generate consistent 3D spaces from their physical environments via scene graphs, allowing them to create spatial Mixed Reality (MR) content.
arXiv Detail & Related papers (2026-02-03T01:22:07Z) - RoamScene3D: Immersive Text-to-3D Scene Generation via Adaptive Object-aware Roaming [79.81527946524098]
RoamScene3D is a novel framework that bridges the gap between semantic guidance and spatial generation.<n>We employ a vision-language model (VLM) to construct a scene graph that encodes object relations.<n>To mitigate the limitations of static 2D priors, we introduce a Motion-Injected Inpainting model that is fine-tuned on a synthetic panoramic dataset.
arXiv Detail & Related papers (2026-01-27T10:10:55Z) - Error-Driven Scene Editing for 3D Grounding in Large Language Models [71.41120775319088]
Despite recent progress in 3D-LLMs, they remain limited in accurately grounding language to visual and spatial elements in 3D environments.<n>This limitation stems in part from training data that focuses on language reasoning rather than spatial understanding due to scarce 3D resources.<n>We propose 3D scene editing as a key mechanism to generate precise visual counterfactuals that mitigate these biases.
arXiv Detail & Related papers (2025-11-18T03:13:29Z) - Text-to-Scene with Large Reasoning Models [35.61634772862795]
Reason-3D is a text-to-scene model powered by large reasoning models (LRMs)<n>Reason-3D integrates object retrieval using captions covering physical, functional, and contextual attributes.<n>It significantly outperforms previous methods in human-rated visual fidelity, adherence to constraints, and asset retrieval quality.
arXiv Detail & Related papers (2025-09-30T11:08:11Z) - RoomCraft: Controllable and Complete 3D Indoor Scene Generation [51.19602078504066]
RoomCraft is a multi-stage pipeline that converts real images, sketches, or text descriptions into coherent 3D indoor scenes.<n>Our approach combines a scene generation pipeline with a constraint-driven optimization framework.<n>RoomCraft significantly outperforms existing methods in generating realistic, semantically coherent, and visually appealing room layouts.
arXiv Detail & Related papers (2025-06-27T15:03:17Z) - HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z) - CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image [44.8172828045897]
Current methods often struggle with domain-specific limitations or low-quality object generation.<n>We propose CAST, a novel method for 3D scene reconstruction and recovery.
arXiv Detail & Related papers (2025-02-18T14:29:52Z) - LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation [5.424048651554831]
We introduce a framework that leverages 3D Gaussian Splatting (3DGS) to facilitate high-quality, physically consistent compositional scene generation guided by text.<n>Specifically, given a text prompt, we convert it into a directed scene graph and adaptively adjust the density and layout of the initial compositional 3D Gaussians.<n>By extracting directed dependencies from the scene graph, we tailor physical and layout energy to ensure both realism and flexibility.
arXiv Detail & Related papers (2025-02-04T02:51:37Z) - DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for generating 3D scenes from text.<n>We use 3D Gaussian Guide that encodes semantic primitives, spatial transformations and relationships from text using LLMs.<n>DreamScape achieves state-of-the-art performance, enabling high-fidelity, controllable 3D scene generation.
arXiv Detail & Related papers (2024-04-14T12:13:07Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.