CoreEditor: Consistent 3D Editing via Correspondence-constrained Diffusion
- URL: http://arxiv.org/abs/2508.11603v1
- Date: Fri, 15 Aug 2025 17:13:11 GMT
- Title: CoreEditor: Consistent 3D Editing via Correspondence-constrained Diffusion
- Authors: Zhe Zhu, Honghua Chen, Peng Li, Mingqiang Wei,
- Abstract summary: CoreEditor is a novel framework for consistent text-to-3D editing.<n>We introduce a correspondence-constrained attention mechanism that enforces precise interactions between pixels.<n>In experiments, CoreEditor produces high-quality, 3D-consistent edits with sharper details.
- Score: 24.144486805878596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven 3D editing seeks to modify 3D scenes according to textual descriptions, and most existing approaches tackle this by adapting pre-trained 2D image editors to multi-view inputs. However, without explicit control over multi-view information exchange, they often fail to maintain cross-view consistency, leading to insufficient edits and blurry details. We introduce CoreEditor, a novel framework for consistent text-to-3D editing. The key innovation is a correspondence-constrained attention mechanism that enforces precise interactions between pixels expected to remain consistent throughout the diffusion denoising process. Beyond relying solely on geometric alignment, we further incorporate semantic similarity estimated during denoising, enabling more reliable correspondence modeling and robust multi-view editing. In addition, we design a selective editing pipeline that allows users to choose preferred results from multiple candidates, offering greater flexibility and user control. Extensive experiments show that CoreEditor produces high-quality, 3D-consistent edits with sharper details, significantly outperforming prior methods.
Related papers
- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing [106.07976338405793]
Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm.<n>We propose textbfRL3DEdit, a single-pass framework driven by reinforcement learning with novel rewards derived from the 3D foundation model, VGGT.<n>Experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency.
arXiv Detail & Related papers (2026-03-03T16:31:10Z) - Edit3r: Instant 3D Scene Editing from Sparse Unposed Images [40.421700685587346]
We present Edit3r, a framework that reconstructs and edits 3D scenes in a single pass from unposed, view-inconsistent, instruction-edited images.<n>We show that Edit3r achieves superior semantic alignment and enhanced 3D consistency compared to recent baselines.
arXiv Detail & Related papers (2025-12-31T18:59:53Z) - 3D-Consistent Multi-View Editing by Diffusion Guidance [17.847266433739147]
Methods that edit images independently often produce geometrically and photometrically inconsistent results across different views.<n>We propose a training-free diffusion framework that enforces multi-view consistency during the image editing process.<n>We show that our approach significantly improves 3D consistency compared to existing multi-view editing methods.
arXiv Detail & Related papers (2025-11-27T08:48:36Z) - Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine [83.0145525456509]
We present FFSE, a 3D-aware framework designed to enable intuitive, physically-consistent object editing on real-world images.<n>Unlike previous approaches that either operate in image space or require slow and error-prone 3D reconstruction, FFSE models editing as a sequence of learned 3D transformations.<n>To support learning of multi-round 3D-aware object manipulation, we introduce 3DObjectEditor.
arXiv Detail & Related papers (2025-11-17T18:57:39Z) - C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing [37.439731931558036]
C3Editor is a controllable and consistent 2D-lifting-based 3D editing framework.<n>Our method selectively establishes a view-consistent 2D editing model to achieve superior 3D editing results.<n>Our approach delivers more consistent and controllable 2D and 3D editing results than existing 2D-lifting-based methods.
arXiv Detail & Related papers (2025-10-06T07:07:14Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.<n>A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.<n>This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.<n>By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization [21.8454418337306]
We propose Plasticine3D, a novel text-guided controlled 3D editing pipeline that can perform 3D non-rigid editing.
Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance.
For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space.
arXiv Detail & Related papers (2023-12-15T09:01:54Z) - SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds [73.91114735118298]
Shap-Editor is a novel feed-forward 3D editing framework.
We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward editor network.
arXiv Detail & Related papers (2023-12-14T18:59:06Z) - Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.