Related papers: EditThinker: Unlocking Iterative Reasoning for Any Image Editor

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

URL: http://arxiv.org/abs/2512.05965v1
Date: Fri, 05 Dec 2025 18:58:09 GMT
Title: EditThinker: Unlocking Iterative Reasoning for Any Image Editor
Authors: Hongyu Li, Manyuan Zhang, Dian Zheng, Ziyu Guo, Yimeng Jia, Kaituo Feng, Hao Yu, Yexin Liu, Yan Feng, Peng Pei, Xunliang Cai, Linjiang Huang, Hongsheng Li, Si Liu,
Abstract summary: We propose a deliberative editing framework to 'think' while they edit.<n>We train a single MLLM, EditThinker, to act as the reasoning engine of this framework.<n>We employ reinforcement learning to align the EditThinker's thinking with its editing, thereby generating more targeted instruction improvements.
Score: 72.28251670314451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aesthetic quality, making instruction-following capability the primary challenge. Existing approaches improve instruction adherence via supervised or reinforcement learning, yet single-turn success rates remain limited due to inherent stochasticity and a lack of deliberation. In this work, we propose a deliberative editing framework to 'think' while they edit, which simulates the human cognitive loop by iteratively executing a Think-while-Edit cycle: Critiquing results and Refining instructions , followed by Repeating the generation until satisfactory. Specifically, we train a single MLLM, EditThinker, to act as the reasoning engine of this framework, which jointly produce the critique score, reasoning process, and refined instructions. We employ reinforcement learning to align the EditThinker's thinking with its editing, thereby generating more targeted instruction improvements. Extensive experiments on four benchmarks demonstrate that our approach significantly improves the instruction-following capability of any image editing model by a large margin. We will release our data construction framework, datasets, and models to benefit the community.

Related papers

Instruction-based Image Editing with Planning, Reasoning, and Generation [52.0364486403062]
Prior work utilizes a chain of large language models, object segmentation models, and editing models for this task.<n>We aim to bridge understanding and generation via a new multi-modality model that provides the intelligent abilities to instruction-based image editing models.<n>Our method has competitive editing abilities on complex real-world images.
arXiv Detail & Related papers (2026-02-26T04:56:02Z)
RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward [64.78078130943489]
We introduce RetouchIQ, a framework that performs instruction-based executable image editing through MLLM agents guided by a reward model.<n>We show that RetouchIQ substantially improves both semantic consistency and perceptual quality over previous MLLM-based and diffusion-based editing systems.
arXiv Detail & Related papers (2026-02-19T17:11:59Z)
ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing [33.888289858260706]
Reinforcement learning (RL) has been investigated for improving the quality of image editing.<n>RL faces three key challenges: (1) limited reasoning exploration confined to denoising, (2) biased reward fusion, and (3) unstable VLM-based instruction rewards.<n>We propose ThinkRL-Edit, a reasoning-centric RL framework that decouples visual reasoning from image synthesis.
arXiv Detail & Related papers (2026-01-06T23:43:00Z)
ReasonEdit: Towards Reasoning-Enhanced Image Editing Models [60.902953259781675]
A common architectural design couples a multimodal large language model (MLLM) encoder with a diffusion decoder.<n>We show that unlocking the reasoning capabilities of MLLM can push the boundaries of editing models.<n>Our proposed framework enables image editing in a thinking-editing-reflection loop.
arXiv Detail & Related papers (2025-11-27T17:02:48Z)
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control [55.64204232819136]
We introduce a novel framework for training-free, reward-guided image editing.<n>We demonstrate that our approach significantly outperforms existing inversion-based training-free baselines.
arXiv Detail & Related papers (2025-09-30T06:34:37Z)
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling [71.8265422228785]
Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been hindered by the lack of a high-fidelity, efficient reward signal.<n>We present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model.
arXiv Detail & Related papers (2025-09-28T14:28:24Z)
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing [25.8179737362091]
Existing datasets are typically constructed using various automated methods, leading to noisy supervision signals.<n>Recent efforts attempt to improve editing models through generating higher-quality edited images, pre-training on recognition tasks, or introducing vision-language models (VLMs) but fail to resolve this fundamental issue.<n>In this paper, we offer a novel solution by constructing more effective editing instructions for given image pairs.
arXiv Detail & Related papers (2025-05-05T05:19:40Z)
SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI feedback [28.807572302899004]
SPIE is a novel approach for semantic and structural post-training of instruction-based image editing diffusion models.<n>We introduce an online reinforcement learning framework that aligns the diffusion model with human preferences without relying on extensive human annotations.<n> Experimental results demonstrate that SPIE can perform intricate edits in complex scenes, after just 10 training steps.
arXiv Detail & Related papers (2025-04-17T10:46:39Z)
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint [87.20985852686785]
We propose an unsupervised instruction-based image editing approach that removes the need for ground-truth edited images during training.<n>Our approach addresses these challenges by introducing a novel editing mechanism called Edit Reversibility Constraint (ERC), which applies forward and reverse edits in one training step.<n>This allows us to bypass the need for ground-truth edited images and unlock training for the first time on datasets comprising either real image-caption pairs or image-caption-instruction triplets.
arXiv Detail & Related papers (2024-12-19T18:59:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.