CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning
- URL: http://arxiv.org/abs/2602.14068v1
- Date: Sun, 15 Feb 2026 09:36:54 GMT
- Title: CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning
- Authors: Yuhui Wu, Chenxi Xie, Ruibin Li, Liyi Chen, Qiaosi Yi, Lei Zhang,
- Abstract summary: We present a post-training framework for Content-Consistent Editing (CoCoEdit)<n>We first augment existing editing datasets with refined instructions and masks, from which 40K diverse and high quality samples are curated as training set.<n>We then introduce a pixel-level similarity reward to complement MLLM-based rewards, enabling models to ensure both editing quality and content consistency during the editing process.
- Score: 15.375069717719157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image editing has achieved impressive results with the development of large-scale generative models. However, existing models mainly focus on the editing effects of intended objects and regions, often leading to unwanted changes in unintended regions. We present a post-training framework for Content-Consistent Editing (CoCoEdit) via region regularized reinforcement learning. We first augment existing editing datasets with refined instructions and masks, from which 40K diverse and high quality samples are curated as training set. We then introduce a pixel-level similarity reward to complement MLLM-based rewards, enabling models to ensure both editing quality and content consistency during the editing process. To overcome the spatial-agnostic nature of the rewards, we propose a region-based regularizer, aiming to preserve non-edited regions for high-reward samples while encouraging editing effects for low-reward samples. For evaluation, we annotate editing masks for GEdit-Bench and ImgEdit-Bench, introducing pixel-level similarity metrics to measure content consistency and editing quality. Applying CoCoEdit to Qwen-Image-Edit and FLUX-Kontext, we achieve not only competitive editing scores with state-of-the-art models, but also significantly better content consistency, measured by PSNR/SSIM metrics and human subjective ratings.
Related papers
- ProEdit: Inversion-based Editing From Prompts Done Right [63.554692704101]
Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions.<n>Existing methods typically inject source image information during the sampling process to maintain editing consistency.<n>We propose ProEdit to address this issue both in the attention and the latent aspects.
arXiv Detail & Related papers (2025-12-26T18:59:14Z) - SpotEdit: Selective Region Editing in Diffusion Transformers [66.44912649206553]
SpotEdit is a training-free diffusion editing framework that selectively updates only the modified regions.<n>By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
arXiv Detail & Related papers (2025-12-26T14:59:41Z) - EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing [84.7089707244905]
Masked Generative Transformers (MGTs) exhibit a localized decoding paradigm that endows them with the inherent capacity to preserve non-relevant regions during the editing process.<n>We introduce the first MGT-based image editing framework, termed EditMGT.<n>We demonstrate that EditMGT's cross-attention maps provide informative localization signals for localizing edit-relevant regions.<n>We also introduce region-hold sampling, which restricts token flipping within low-attention areas to suppress spurious edits.
arXiv Detail & Related papers (2025-12-12T16:51:19Z) - MoEdit: On Learning Quantity Perception for Multi-object Image Editing [30.569177864762167]
MoEdit is an auxiliary-free multi-object image editing framework.<n>We present the Feature Compensation (FeCom) module, which ensures the distinction and separability of each object attribute.<n>We also present the Quantity Attention (QTTN) module, which perceives and preserves quantity consistency by effective control in editing.
arXiv Detail & Related papers (2025-03-13T07:13:54Z) - BrushEdit: All-In-One Image Inpainting and Editing [76.93556996538398]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.<n>We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.<n>Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z) - MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based
Attention-Adjusted Guidance [28.212908146852197]
We develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios.
In particular, MAG-Edit optimize the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints.
arXiv Detail & Related papers (2023-12-18T17:55:44Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.