Related papers: EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

URL: http://arxiv.org/abs/2410.12836v1
Date: Thu, 03 Oct 2024 17:42:24 GMT
Title: EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Authors: Kaizhi Zheng, Xiaotong Chen, Xuehai He, Jing Gu, Linjie Li, Zhengyuan Yang, Kevin Lin, Jianfeng Wang, Lijuan Wang, Xin Eric Wang,
Abstract summary: We propose Edit-Room, a framework capable of executing a variety of layout edits through natural language commands. Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes. We have developed an automatic pipeline to augment existing 3D scene datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs.
Score: 114.14164860467227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming. However, recent approaches to language-guided 3D scene editing either require manual interventions or focus only on appearance modifications without supporting comprehensive scene layout changes. In response, we propose Edit-Room, a unified framework capable of executing a variety of layout edits through natural language commands, without requiring manual intervention. Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes using a diffusion-based method, enabling six types of edits: rotate, translate, scale, replace, add, and remove. To address the lack of data for language-guided 3D scene editing, we have developed an automatic pipeline to augment existing 3D scene synthesis datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs, for training and evaluation. Our experiments demonstrate that our approach consistently outperforms other baselines across all metrics, indicating higher accuracy and coherence in language-guided scene layout editing.

Related papers

Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors [67.22744959435708]
3D semantic parsing often underperforms compared to its 2D counterpart, making targeted manipulations within 3D spaces more difficult and limiting the fidelity of edits.<n>We address this problem by leveraging 2D diffusion editing to accurately identify modification regions in each view, followed by inverse rendering for 3D localization.<n> Experiments demonstrate that our method achieves state-of-the-art performance while delivering up to a $4times$ speedup.
arXiv Detail & Related papers (2025-07-07T19:15:43Z)
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing [48.890592121135015]
ScanEdit is an instruction-driven method for functional editing of complex, real-world 3D scans. Given a 3D scan decomposed into its object instances, we first construct a hierarchical scene graph representation.
arXiv Detail & Related papers (2025-04-21T12:12:43Z)
DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions [9.31257776760014]
3D editing has shown remarkable capability in editing scenes based on various instructions. Existing methods struggle with achieving intuitive, localized editing. We introduce DragScene, a framework that integrates drag-style editing with diverse 3D representations.
arXiv Detail & Related papers (2024-12-18T07:02:01Z)
PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z)
3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting [31.98493679748211]
We propose 3DSceneEditor, a fully 3D-based paradigm for real-time, precise editing of 3D scenes using Gaussian Splatting. Unlike conventional methods, 3DSceneEditor operates through a streamlined 3D pipeline, enabling direct manipulation of Gaussians for efficient, high-quality edits.
arXiv Detail & Related papers (2024-12-02T15:03:55Z)
3DEgo: 3D Editing on the Go! [6.072473323242202]
We introduce 3DEgo to address a novel problem of directly synthesizing 3D scenes from monocular videos guided by textual prompts. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow. 3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources.
arXiv Detail & Related papers (2024-07-14T07:03:50Z)
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts [76.73043724587679]
We propose a dialogue-based 3D scene editing approach, termed CE3D. Hash-Atlas represents 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images. Results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects.
arXiv Detail & Related papers (2024-07-09T13:24:42Z)
LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model [58.24851949945434]
LLplace is a novel 3D indoor scene layout designer based on lightweight fine-tuned open-source LLM Llama3. LLplace circumvents the need for spatial relationship priors and in-context exemplars, enabling efficient and credible room layout generation. Our approach demonstrates that LLplace can effectively generate and edit 3D indoor layouts interactively and outperform existing methods in delivering high-quality 3D design solutions.
arXiv Detail & Related papers (2024-06-06T08:53:01Z)
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing. We propose 3DitScene, a novel and unified scene editing framework. It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z)
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds [73.91114735118298]
Shap-Editor is a novel feed-forward 3D editing framework. We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward editor network.
arXiv Detail & Related papers (2023-12-14T18:59:06Z)
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints [35.073500525250346]
We present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt. Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items.
arXiv Detail & Related papers (2023-10-05T15:29:52Z)
Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities. Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images. Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z)
OBJECT 3DIT: Language-guided 3D-aware Image Editing [27.696507467754877]
Existing image editing tools disregard the underlying 3D geometry from which the image is projected. We formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. We release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations.
arXiv Detail & Related papers (2023-07-20T17:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.