Vinedresser3D: Agentic Text-guided 3D Editing
- URL: http://arxiv.org/abs/2602.19542v1
- Date: Mon, 23 Feb 2026 06:30:36 GMT
- Title: Vinedresser3D: Agentic Text-guided 3D Editing
- Authors: Yankuan Chi, Xiang Li, Zixuan Huang, James M. Rehg,
- Abstract summary: Vinedresser3D is an agentic framework for high-quality text-guided 3D editing.<n>It operates directly in the latent space of a native 3D generative model.<n>Experiments on diverse 3D edits demonstrate that Vinedresser3D outperforms prior baselines in both automatic metrics and human preference studies.
- Score: 26.81659566314386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-guided 3D editing aims to modify existing 3D assets using natural-language instructions. Current methods struggle to jointly understand complex prompts, automatically localize edits in 3D, and preserve unedited content. We introduce Vinedresser3D, an agentic framework for high-quality text-guided 3D editing that operates directly in the latent space of a native 3D generative model. Given a 3D asset and an editing prompt, Vinedresser3D uses a multimodal large language model to infer rich descriptions of the original asset, identify the edit region and edit type (addition, modification, deletion), and generate decomposed structural and appearance-level text guidance. The agent then selects an informative view and applies an image editing model to obtain visual guidance. Finally, an inversion-based rectified-flow inpainting pipeline with an interleaved sampling module performs editing in the 3D latent space, enforcing prompt alignment while maintaining 3D coherence and unedited regions. Experiments on diverse 3D edits demonstrate that Vinedresser3D outperforms prior baselines in both automatic metrics and human preference studies, while enabling precise, coherent, and mask-free 3D editing.
Related papers
- Towards Scalable and Consistent 3D Editing [32.16698854719098]
3D editing has wide applications in immersive content creation, digital entertainment, and AR/VR.<n>Unlike 2D editing, it remains challenging due to the need for cross-view consistency, structural fidelity, and fine-grained controllability.<n>We introduce 3DEditVerse, the largest paired 3D editing benchmark to date, comprising 116,309 high-quality training pairs and 1,500 curated test pairs.<n>On the model side, we propose 3DEditFormer, a 3D-structure-preserving conditional transformer.
arXiv Detail & Related papers (2025-10-03T13:34:55Z) - 3D-LATTE: Latent Space 3D Editing from Textual Instructions [64.77718887666312]
We propose a training-free editing method that operates within the latent space of a native 3D diffusion model.<n>We guide the edit synthesis by blending 3D attention maps from the generation with the source object.
arXiv Detail & Related papers (2025-08-29T22:51:59Z) - Instructive3D: Editing Large Reconstruction Models with Text Instructions [2.9575146209034853]
Instructive3D is a novel LRM based model that integrates generation and fine-grained editing, through user text prompts, of 3D objects into a single model.<n>We find that Instructive3D produces superior 3D objects with the properties specified by the edit prompts.
arXiv Detail & Related papers (2025-01-08T09:28:25Z) - DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions [9.31257776760014]
3D editing has shown remarkable capability in editing scenes based on various instructions.<n>Existing methods struggle with achieving intuitive, localized editing.<n>We introduce DragScene, a framework that integrates drag-style editing with diverse 3D representations.
arXiv Detail & Related papers (2024-12-18T07:02:01Z) - PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes.<n>The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z) - EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing [114.14164860467227]
We propose EditRoom, a framework capable of executing a variety of layout edits through natural language commands.<n>Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes.<n>We have developed an automatic pipeline to augment existing 3D scene datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs.
arXiv Detail & Related papers (2024-10-03T17:42:24Z) - Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts [76.73043724587679]
We propose a dialogue-based 3D scene editing approach, termed CE3D.
Hash-Atlas represents 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images.
Results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects.
arXiv Detail & Related papers (2024-07-09T13:24:42Z) - TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts [119.84478647745658]
TIPEditor is a 3D scene editing framework that accepts both text and image prompts and a 3D bounding box to specify the editing region.
Experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region.
arXiv Detail & Related papers (2024-01-26T12:57:05Z) - SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds [73.91114735118298]
Shap-Editor is a novel feed-forward 3D editing framework.
We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward editor network.
arXiv Detail & Related papers (2023-12-14T18:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.