Related papers: GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

URL: http://arxiv.org/abs/2312.00093v2
Date: Mon, 10 Jun 2024 19:08:03 GMT
Title: GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Authors: Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf,
Abstract summary: We propose a novel framework to generate compositional 3D scenes from scene graphs. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
Score: 74.98581417902201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

Related papers

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis [14.137982018879049]
Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents.<n>Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs)<n>We introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes.
arXiv Detail & Related papers (2025-11-18T20:06:49Z)
DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting [24.719972380079405]
DecompDreamer is a training routine designed to generate high-quality 3D compositions. It decomposes scenes into structured components and their relationships. It effectively generates intricate 3D compositions with superior object disentanglement.
arXiv Detail & Related papers (2025-03-15T03:37:25Z)
Toward Scene Graph and Layout Guided Complex 3D Scene Generation [31.396230860775415]
We present a novel framework of Scene Graph and Layout Guided 3D Scene Generation (GraLa3D) Given a text prompt describing a complex 3D scene, GraLa3D utilizes LLM to model the scene using a scene graph representation with layout bounding box information. GraLa3D uniquely constructs the scene graph with single-object nodes and composite super-nodes.
arXiv Detail & Related papers (2024-12-29T14:21:03Z)
Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming [44.32980579195508]
We introduce Generate Any Scene, a framework that enumerates scene graphs representing a vast array of visual scenes. Generate Any Scene translates each scene graph into a caption, enabling scalable evaluation of text-to-vision models. We conduct extensive evaluations across text-to-image, text-to-video, and text-to-3D models, presenting key findings on model performance.
arXiv Detail & Related papers (2024-12-11T09:17:39Z)
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [39.03289977892935]
RealmDreamer is a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles.
arXiv Detail & Related papers (2024-04-10T17:57:41Z)
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data. We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z)
SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z)
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [67.5351491691866]
We present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles. Our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.
arXiv Detail & Related papers (2023-12-07T12:10:05Z)
ATT3D: Amortized Text-to-3D Object Synthesis [78.96673650638365]
We amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooths between text for novel assets and simple animations.
arXiv Detail & Related papers (2023-06-06T17:59:10Z)
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models [21.622420436349245]
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
arXiv Detail & Related papers (2023-03-21T16:21:02Z)
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs [85.54212143154986]
Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Scene graphs are representations of a scene composed of objects (nodes) and inter-object relationships (edges) We propose the first work that directly generates shapes from a scene graph in an end-to-end manner.
arXiv Detail & Related papers (2021-08-19T17:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.