PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
- URL: http://arxiv.org/abs/2502.00708v1
- Date: Sun, 02 Feb 2025 07:47:03 GMT
- Title: PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
- Authors: Qixuan Li, Chao Wang, Zongjin He, Yan Peng,
- Abstract summary: We propose a novel framework for compositional scene generation, PhiP-G.
PhiP-G seamlessly integrates generation techniques with layout guidance based on a world model.
Experiments demonstrate that PhiP-G significantly enhances the generation quality and physical rationality of the compositional scenes.
- Score: 5.554872561486615
- License:
- Abstract: Text-to-3D asset generation has achieved significant optimization under the supervision of 2D diffusion priors. However, when dealing with compositional scenes, existing methods encounter several challenges: 1). failure to ensure that composite scene layouts comply with physical laws; 2). difficulty in accurately capturing the assets and relationships described in complex scene descriptions; 3). limited autonomous asset generation capabilities among layout approaches leveraging large language models (LLMs). To avoid these compromises, we propose a novel framework for compositional scene generation, PhiP-G, which seamlessly integrates generation techniques with layout guidance based on a world model. Leveraging LLM-based agents, PhiP-G analyzes the complex scene description to generate a scene graph, and integrating a multimodal 2D generation agent and a 3D Gaussian generation method for targeted assets creation. For the stage of layout, PhiP-G employs a physical pool with adhesion capabilities and a visual supervision agent, forming a world model for layout prediction and planning. Extensive experiments demonstrate that PhiP-G significantly enhances the generation quality and physical rationality of the compositional scenes. Notably, PhiP-G attains state-of-the-art (SOTA) performance in CLIP scores, achieves parity with the leading methods in generation quality as measured by the T$^3$Bench, and improves efficiency by 24x.
Related papers
- LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation [5.424048651554831]
We introduce a framework that leverages 3D Gaussian Splatting (3DGS) to facilitate high-quality, physically consistent compositional scene generation guided by text.
Specifically, given a text prompt, we convert it into a directed scene graph and adaptively adjust the density and layout of the initial compositional 3D Gaussians.
By extracting directed dependencies from the scene graph, we tailor physical and layout energy to ensure both realism and flexibility.
arXiv Detail & Related papers (2025-02-04T02:51:37Z) - Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting [47.014044892025346]
Architect is a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting.
Our pipeline is further extended to a hierarchical and iterative inpainting process to continuously generate placement of large furniture and small objects to enrich the scene.
arXiv Detail & Related papers (2024-11-14T22:15:48Z) - CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians [97.15119679296954]
CompGS is a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation.
CompGS can be easily extended to controllable 3D editing, facilitating scene generation.
arXiv Detail & Related papers (2024-10-28T04:35:14Z) - SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene.
SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z) - SceneX: Procedural Controllable Large-scale Scene Generation [52.4743878200172]
We introduce SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.
The proposed method comprises two components, PCGHub and PCGPlanner.
The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions.
arXiv Detail & Related papers (2024-03-23T03:23:29Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting [57.14748263512924]
CG3D is a method for compositionally generating scalable 3D assets.
Gamma radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes.
arXiv Detail & Related papers (2023-11-29T18:55:38Z) - Structural Multiplane Image: Bridging Neural View Synthesis and 3D
Reconstruction [39.89856628467095]
We introduce the Structural MPI (S-MPI), where the plane structure approximates 3D scenes concisely.
Despite the intuition and demand of applying S-MPI, great challenges are introduced, e.g., high-fidelity approximation for both RGBA layers and plane poses.
Our method outperforms both previous state-of-the-art MPI-based view synthesis methods and planar reconstruction methods.
arXiv Detail & Related papers (2023-03-10T14:18:40Z) - PEGG-Net: Pixel-Wise Efficient Grasp Generation in Complex Scenes [7.907697609965681]
In this work, we study the existing planar grasp estimation algorithms and analyze the related challenges in complex scenes.
We design a Pixel-wise Efficient Grasp Generation Network (PEGG-Net) to tackle the problem of grasping in complex scenes.
PEGG-Net can achieve improved state-of-the-art performance on the Cornell dataset (98.9%) and second-best performance on the Jacquard dataset (93.8%)
arXiv Detail & Related papers (2022-03-30T13:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.