Related papers: Towards Scalable and Consistent 3D Editing

Towards Scalable and Consistent 3D Editing

URL: http://arxiv.org/abs/2510.02994v1
Date: Fri, 03 Oct 2025 13:34:55 GMT
Title: Towards Scalable and Consistent 3D Editing
Authors: Ruihao Xia, Yang Tang, Pan Zhou,
Abstract summary: 3D editing has wide applications in immersive content creation, digital entertainment, and AR/VR.<n>Unlike 2D editing, it remains challenging due to the need for cross-view consistency, structural fidelity, and fine-grained controllability.<n>We introduce 3DEditVerse, the largest paired 3D editing benchmark to date, comprising 116,309 high-quality training pairs and 1,500 curated test pairs.<n>On the model side, we propose 3DEditFormer, a 3D-structure-preserving conditional transformer.
Score: 32.16698854719098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D editing - the task of locally modifying the geometry or appearance of a 3D asset - has wide applications in immersive content creation, digital entertainment, and AR/VR. However, unlike 2D editing, it remains challenging due to the need for cross-view consistency, structural fidelity, and fine-grained controllability. Existing approaches are often slow, prone to geometric distortions, or dependent on manual and accurate 3D masks that are error-prone and impractical. To address these challenges, we advance both the data and model fronts. On the data side, we introduce 3DEditVerse, the largest paired 3D editing benchmark to date, comprising 116,309 high-quality training pairs and 1,500 curated test pairs. Built through complementary pipelines of pose-driven geometric edits and foundation model-guided appearance edits, 3DEditVerse ensures edit locality, multi-view consistency, and semantic alignment. On the model side, we propose 3DEditFormer, a 3D-structure-preserving conditional transformer. By enhancing image-to-3D generation with dual-guidance attention and time-adaptive gating, 3DEditFormer disentangles editable regions from preserved structure, enabling precise and consistent edits without requiring auxiliary 3D masks. Extensive experiments demonstrate that our framework outperforms state-of-the-art baselines both quantitatively and qualitatively, establishing a new standard for practical and scalable 3D editing. Dataset and code will be released. Project: https://www.lv-lab.org/3DEditFormer/

Related papers

Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing [106.07976338405793]
Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm.<n>We propose textbfRL3DEdit, a single-pass framework driven by reinforcement learning with novel rewards derived from the 3D foundation model, VGGT.<n>Experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency.
arXiv Detail & Related papers (2026-03-03T16:31:10Z)
Easy3E: Feed-Forward 3D Asset Editing via Rectified Voxel Flow [29.8200628539749]
We propose an effective and fully feedforward 3D editing framework based on the TRELLIS generative backbone.<n>Our framework addresses two key issues: adapting training-free 2D editing to structured 3D representations, and overcoming the bottleneck of appearance fidelity in compressed 3D features.
arXiv Detail & Related papers (2026-02-25T02:15:14Z)
ShapeUP: Scalable Image-Conditioned 3D Editing [44.63222737714384]
ShapeUP is a scalable, image-conditioned 3D editing framework.<n>It formulates editing as a supervised latent-to-latent translation within a native 3D representation.<n>Our evaluations demonstrate that ShapeUP consistently outperforms current trained and training-free baselines in both identity preservation and edit fidelity.
arXiv Detail & Related papers (2026-02-05T13:59:16Z)
3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing [58.54083747494426]
3DGS-Drag is a point-based 3D editing framework that provides efficient, intuitive drag manipulation of real 3D scenes.<n>Our approach bridges the gap between deformation-based and 2D-editing-based 3D editing methods.
arXiv Detail & Related papers (2026-01-12T19:57:31Z)
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks [29.825760228576744]
Nano3D is a training-free framework for precise and coherent 3D object editing without masks.<n>We construct the first large-scale 3D editing datasets Nano3D-Edit-100k, which contains over 100,000 high-quality 3D editing pairs.
arXiv Detail & Related papers (2025-10-16T17:51:50Z)
C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing [37.439731931558036]
C3Editor is a controllable and consistent 2D-lifting-based 3D editing framework.<n>Our method selectively establishes a view-consistent 2D editing model to achieve superior 3D editing results.<n>Our approach delivers more consistent and controllable 2D and 3D editing results than existing 2D-lifting-based methods.
arXiv Detail & Related papers (2025-10-06T07:07:14Z)
3D-LATTE: Latent Space 3D Editing from Textual Instructions [64.77718887666312]
We propose a training-free editing method that operates within the latent space of a native 3D diffusion model.<n>We guide the edit synthesis by blending 3D attention maps from the generation with the source object.
arXiv Detail & Related papers (2025-08-29T22:51:59Z)
PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes.<n>The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z)
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images [72.70883914827687]
Tailor3D is a novel pipeline that creates customized 3D assets from editable dual-side images. It provides a user-friendly, efficient solution for editing 3D assets, with each editing step taking only seconds to complete.
arXiv Detail & Related papers (2024-07-08T17:59:55Z)
Reference-Based 3D-Aware Image Editing with Triplanes [15.222454412573455]
This study explores and demonstrates the effectiveness of the triplane space for advanced reference-based edits.<n>Our approach integrates encoding, automatic localization, spatial disentanglement of triplane features, and fusion learning to achieve the desired edits.<n>We demonstrate how our approach excels across diverse domains, including human faces, 360-degree heads, animal faces, partially stylized edits like cartoon faces, full-body clothing edits, and edits on class-agnostic samples.
arXiv Detail & Related papers (2024-04-04T17:53:33Z)
View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.<n>By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z)
Image Sculpting: Precise Object Editing with 3D Geometry Control [33.9777412846583]
Image Sculpting is a new framework for editing 2D images by incorporating tools from 3D geometry and graphics. It supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition.
arXiv Detail & Related papers (2024-01-02T18:59:35Z)
Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization [21.8454418337306]
We propose Plasticine3D, a novel text-guided controlled 3D editing pipeline that can perform 3D non-rigid editing. Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance. For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space.
arXiv Detail & Related papers (2023-12-15T09:01:54Z)
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds [73.91114735118298]
Shap-Editor is a novel feed-forward 3D editing framework. We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward editor network.
arXiv Detail & Related papers (2023-12-14T18:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.