Native 3D Editing with Full Attention
- URL: http://arxiv.org/abs/2511.17501v1
- Date: Fri, 21 Nov 2025 18:59:26 GMT
- Title: Native 3D Editing with Full Attention
- Authors: Weiwei Cai, Shuangkang Fang, Weicai Ye, Xin Dong, Yunhan Yang, Xuanyang Zhang, Wei Cheng, Yanpei Cao, Gang Yu, Tao Chen,
- Abstract summary: We propose a novel native 3D editing framework that directly manipulates 3D representations in a single, efficient feed-forward pass.<n>This dataset is meticulously curated to ensure that edited objects faithfully adhere to the instructional changes.<n>Our results demonstrate that token concatenation is more parameter-efficient and achieves superior performance.
- Score: 47.908091876301796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction-guided 3D editing is a rapidly emerging field with the potential to broaden access to 3D content creation. However, existing methods face critical limitations: optimization-based approaches are prohibitively slow, while feed-forward approaches relying on multi-view 2D editing often suffer from inconsistent geometry and degraded visual quality. To address these issues, we propose a novel native 3D editing framework that directly manipulates 3D representations in a single, efficient feed-forward pass. Specifically, we create a large-scale, multi-modal dataset for instruction-guided 3D editing, covering diverse addition, deletion, and modification tasks. This dataset is meticulously curated to ensure that edited objects faithfully adhere to the instructional changes while preserving the consistency of unedited regions with the source object. Building upon this dataset, we explore two distinct conditioning strategies for our model: a conventional cross-attention mechanism and a novel 3D token concatenation approach. Our results demonstrate that token concatenation is more parameter-efficient and achieves superior performance. Extensive evaluations show that our method outperforms existing 2D-lifting approaches, setting a new benchmark in generation quality, 3D consistency, and instruction fidelity.
Related papers
- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing [106.07976338405793]
Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm.<n>We propose textbfRL3DEdit, a single-pass framework driven by reinforcement learning with novel rewards derived from the 3D foundation model, VGGT.<n>Experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency.
arXiv Detail & Related papers (2026-03-03T16:31:10Z) - Easy3E: Feed-Forward 3D Asset Editing via Rectified Voxel Flow [29.8200628539749]
We propose an effective and fully feedforward 3D editing framework based on the TRELLIS generative backbone.<n>Our framework addresses two key issues: adapting training-free 2D editing to structured 3D representations, and overcoming the bottleneck of appearance fidelity in compressed 3D features.
arXiv Detail & Related papers (2026-02-25T02:15:14Z) - Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine [83.0145525456509]
We present FFSE, a 3D-aware framework designed to enable intuitive, physically-consistent object editing on real-world images.<n>Unlike previous approaches that either operate in image space or require slow and error-prone 3D reconstruction, FFSE models editing as a sequence of learned 3D transformations.<n>To support learning of multi-round 3D-aware object manipulation, we introduce 3DObjectEditor.
arXiv Detail & Related papers (2025-11-17T18:57:39Z) - NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks [29.825760228576744]
Nano3D is a training-free framework for precise and coherent 3D object editing without masks.<n>We construct the first large-scale 3D editing datasets Nano3D-Edit-100k, which contains over 100,000 high-quality 3D editing pairs.
arXiv Detail & Related papers (2025-10-16T17:51:50Z) - 3D-LATTE: Latent Space 3D Editing from Textual Instructions [64.77718887666312]
We propose a training-free editing method that operates within the latent space of a native 3D diffusion model.<n>We guide the edit synthesis by blending 3D attention maps from the generation with the source object.
arXiv Detail & Related papers (2025-08-29T22:51:59Z) - DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation [57.406031264184584]
DragGaussian is a 3D object drag-editing framework based on 3D Gaussian Splatting.
Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
arXiv Detail & Related papers (2024-05-09T14:34:05Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.<n>A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.<n>This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting [86.50200613220674]
A key challenge in 3D-based interactive editing is the absence of an efficient representation that balances diverse modifications with high-quality view synthesis under a given memory constraint.<n>We introduce SplatMesh, a novel fine-grained interactive 3D segmentation and editing algorithm that integrates 3D Gaussian Splatting with a precomputed mesh.<n>By segmenting and editing the simplified mesh, we can effectively edit the Gaussian splats as well, which will lead to extensive experiments on real and synthetic datasets.
arXiv Detail & Related papers (2023-12-26T02:50:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.