3D-LATTE: Latent Space 3D Editing from Textual Instructions
- URL: http://arxiv.org/abs/2509.00269v1
- Date: Fri, 29 Aug 2025 22:51:59 GMT
- Title: 3D-LATTE: Latent Space 3D Editing from Textual Instructions
- Authors: Maria Parelli, Michael Oechsle, Michael Niemeyer, Federico Tombari, Andreas Geiger,
- Abstract summary: We propose a training-free editing method that operates within the latent space of a native 3D diffusion model.<n>We guide the edit synthesis by blending 3D attention maps from the generation with the source object.
- Score: 64.77718887666312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent success of multi-view diffusion models for text/image-based 3D asset generation, instruction-based editing of 3D assets lacks surprisingly far behind the quality of generation models. The main reason is that recent approaches using 2D priors suffer from view-inconsistent editing signals. Going beyond 2D prior distillation methods and multi-view editing strategies, we propose a training-free editing method that operates within the latent space of a native 3D diffusion model, allowing us to directly manipulate 3D geometry. We guide the edit synthesis by blending 3D attention maps from the generation with the source object. Coupled with geometry-aware regularization guidance, a spectral modulation strategy in the Fourier domain and a refinement step for 3D enhancement, our method outperforms previous 3D editing methods enabling high-fidelity, precise, and robust edits across a wide range of shapes and semantic manipulations.
Related papers
- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing [106.07976338405793]
Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm.<n>We propose textbfRL3DEdit, a single-pass framework driven by reinforcement learning with novel rewards derived from the 3D foundation model, VGGT.<n>Experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency.
arXiv Detail & Related papers (2026-03-03T16:31:10Z) - Easy3E: Feed-Forward 3D Asset Editing via Rectified Voxel Flow [29.8200628539749]
We propose an effective and fully feedforward 3D editing framework based on the TRELLIS generative backbone.<n>Our framework addresses two key issues: adapting training-free 2D editing to structured 3D representations, and overcoming the bottleneck of appearance fidelity in compressed 3D features.
arXiv Detail & Related papers (2026-02-25T02:15:14Z) - ShapeUP: Scalable Image-Conditioned 3D Editing [44.63222737714384]
ShapeUP is a scalable, image-conditioned 3D editing framework.<n>It formulates editing as a supervised latent-to-latent translation within a native 3D representation.<n>Our evaluations demonstrate that ShapeUP consistently outperforms current trained and training-free baselines in both identity preservation and edit fidelity.
arXiv Detail & Related papers (2026-02-05T13:59:16Z) - 3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing [58.54083747494426]
3DGS-Drag is a point-based 3D editing framework that provides efficient, intuitive drag manipulation of real 3D scenes.<n>Our approach bridges the gap between deformation-based and 2D-editing-based 3D editing methods.
arXiv Detail & Related papers (2026-01-12T19:57:31Z) - 3D-Fixup: Advancing Photo Editing with 3D Priors [32.83193513442457]
3D-Fixup is a new framework for editing 2D images guided by learned 3D priors.<n>We leverage a training-based approach that harnesses the generative power of diffusion models.<n>We show that 3D-Fixup effectively supports complex, identity coherent 3D-aware edits.
arXiv Detail & Related papers (2025-05-15T17:59:51Z) - Text-to-3D Generation by 2D Editing [17.17448279533487]
Distilling 3D representations from pretrained 2D diffusion models is essential for 3D creative applications across gaming, film, and interior design.<n>Current SDS-based methods are hindered by inefficient information distillation from diffusion models, which prevents the creation of photorealistic 3D contents.<n>We propose 3D Generation by Editing (GE3D), which exploits pretrained diffusion models to distill multi-granularity information through multiple denoising steps.
arXiv Detail & Related papers (2024-12-08T12:53:05Z) - Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning [52.81032340916171]
Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes.
Our method achieves superior controllability and flexibility in the 3D assets generation task.
arXiv Detail & Related papers (2024-05-13T17:56:13Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.<n>A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.<n>This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - Reference-Based 3D-Aware Image Editing with Triplanes [15.222454412573455]
This study explores and demonstrates the effectiveness of the triplane space for advanced reference-based edits.<n>Our approach integrates encoding, automatic localization, spatial disentanglement of triplane features, and fusion learning to achieve the desired edits.<n>We demonstrate how our approach excels across diverse domains, including human faces, 360-degree heads, animal faces, partially stylized edits like cartoon faces, full-body clothing edits, and edits on class-agnostic samples.
arXiv Detail & Related papers (2024-04-04T17:53:33Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization [21.8454418337306]
We propose Plasticine3D, a novel text-guided controlled 3D editing pipeline that can perform 3D non-rigid editing.
Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance.
For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space.
arXiv Detail & Related papers (2023-12-15T09:01:54Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - XDGAN: Multi-Modal 3D Shape Generation in 2D Space [60.46777591995821]
We propose a novel method to convert 3D shapes into compact 1-channel geometry images and leverage StyleGAN3 and image-to-image translation networks to generate 3D objects in 2D space.
The generated geometry images are quick to convert to 3D meshes, enabling real-time 3D object synthesis, visualization and interactive editing.
We show both quantitatively and qualitatively that our method is highly effective at various tasks such as 3D shape generation, single view reconstruction and shape manipulation, while being significantly faster and more flexible compared to recent 3D generative models.
arXiv Detail & Related papers (2022-10-06T15:54:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.