SSEditor: Controllable Mask-to-Scene Generation with Diffusion Model
- URL: http://arxiv.org/abs/2411.12290v1
- Date: Tue, 19 Nov 2024 07:19:05 GMT
- Title: SSEditor: Controllable Mask-to-Scene Generation with Diffusion Model
- Authors: Haowen Zheng, Yanyan Liang,
- Abstract summary: SSEditor is a controllable Semantic Scene Editor that can generate specified target categories without multiple-step resampling.
In the second stage, we introduce a geometric-semantic fusion module that enhance the model's ability to learn geometric and semantic information.
experiments on the unseen Occ-3D dataset show that SSEditor is capable of generating novel urban scenes.
- Score: 8.664947484778486
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in 3D diffusion-based semantic scene generation have gained attention. However, existing methods rely on unconditional generation and require multiple resampling steps when editing scenes, which significantly limits their controllability and flexibility. To this end, we propose SSEditor, a controllable Semantic Scene Editor that can generate specified target categories without multiple-step resampling. SSEditor employs a two-stage diffusion-based framework: (1) a 3D scene autoencoder is trained to obtain latent triplane features, and (2) a mask-conditional diffusion model is trained for customizable 3D semantic scene generation. In the second stage, we introduce a geometric-semantic fusion module that enhance the model's ability to learn geometric and semantic information. This ensures that objects are generated with correct positions, sizes, and categories. Extensive experiments on SemanticKITTI and CarlaSC demonstrate that SSEditor outperforms previous approaches in terms of controllability and flexibility in target generation, as well as the quality of semantic scene generation and reconstruction. More importantly, experiments on the unseen Occ-3D Waymo dataset show that SSEditor is capable of generating novel urban scenes, enabling the rapid construction of 3D scenes.
Related papers
- Layout2Scene: 3D Semantic Layout Guided Scene Generation via Geometry and Appearance Diffusion Priors [52.63385546943866]
We present a text-to-scene generation method (namely, Layout2Scene) using additional semantic layout as the prompt to inject precise control of 3D object positions.
To fully leverage 2D diffusion priors in geometry and appearance generation, we introduce a semantic-guided geometry diffusion model and a semantic-geometry guided diffusion model.
Our method can generate more plausible and realistic scenes as compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-05T12:20:13Z) - SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation [44.354071773885735]
SceneFactor is a diffusion-based approach for large-scale 3D scene generation.
It enables controllable generation and effortless editing.
Our approach enables high-fidelity 3D scene synthesis with effective controllable editing.
arXiv Detail & Related papers (2024-12-02T18:47:41Z) - 3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting [31.98493679748211]
We propose 3DSceneEditor, a fully 3D-based paradigm for real-time, precise editing of 3D scenes using Gaussian Splatting.
Unlike conventional methods, 3DSceneEditor operates through a streamlined 3D pipeline, enabling direct manipulation of Gaussians for efficient, high-quality edits.
arXiv Detail & Related papers (2024-12-02T15:03:55Z) - EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing [114.14164860467227]
We propose Edit-Room, a framework capable of executing a variety of layout edits through natural language commands.
Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes.
We have developed an automatic pipeline to augment existing 3D scene datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs.
arXiv Detail & Related papers (2024-10-03T17:42:24Z) - 3DEgo: 3D Editing on the Go! [6.072473323242202]
We introduce 3DEgo to address a novel problem of directly synthesizing 3D scenes from monocular videos guided by textual prompts.
Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow.
3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources.
arXiv Detail & Related papers (2024-07-14T07:03:50Z) - Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts [76.73043724587679]
We propose a dialogue-based 3D scene editing approach, termed CE3D.
Hash-Atlas represents 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images.
Results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects.
arXiv Detail & Related papers (2024-07-09T13:24:42Z) - 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing.
We propose 3DitScene, a novel and unified scene editing framework.
It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z) - DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions.
Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations.
A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout [13.364394556439992]
Text-to-3D form plays a crucial role in creating editable 3D scenes for AR/VR.
Recent advances have shown promise in merging neural radiance fields (NeRFs) with pre-trained diffusion models for text-to-3D object generation.
We propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object-specific and scene-wide guidance mechanisms.
Our framework achieves up to a textbf54% improvement by the multi-view CLIP score metric.
arXiv Detail & Related papers (2023-03-24T07:37:09Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - GAUDI: A Neural Architect for Immersive 3D Scene Generation [67.97817314857917]
GAUDI is a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets.
arXiv Detail & Related papers (2022-07-27T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.