EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
- URL: http://arxiv.org/abs/2509.23909v2
- Date: Tue, 30 Sep 2025 15:34:18 GMT
- Title: EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
- Authors: Xin Luo, Jiahao Wang, Chenyuan Wu, Shitao Xiao, Xiyan Jiang, Defu Lian, Jiajun Zhang, Dong Liu, Zheng liu,
- Abstract summary: Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been hindered by the lack of a high-fidelity, efficient reward signal.<n>We present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model.
- Score: 71.8265422228785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
Related papers
- RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward [64.78078130943489]
We introduce RetouchIQ, a framework that performs instruction-based executable image editing through MLLM agents guided by a reward model.<n>We show that RetouchIQ substantially improves both semantic consistency and perceptual quality over previous MLLM-based and diffusion-based editing systems.
arXiv Detail & Related papers (2026-02-19T17:11:59Z) - FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution [87.57784204422218]
Reinforcement Learning with Human Feedback has proven effective in image generation field guided by reward models to align human preferences.<n>We propose a Fine-grained Perceptual Reward Model (FinPercep-RM) based on ancoder-Decoder architecture.<n>While providing a global quality score, it also generates a Perceptual Degradation Map that spatially localizes and quantifies local defects.
arXiv Detail & Related papers (2025-12-27T16:55:21Z) - EditThinker: Unlocking Iterative Reasoning for Any Image Editor [72.28251670314451]
We propose a deliberative editing framework to 'think' while they edit.<n>We train a single MLLM, EditThinker, to act as the reasoning engine of this framework.<n>We employ reinforcement learning to align the EditThinker's thinking with its editing, thereby generating more targeted instruction improvements.
arXiv Detail & Related papers (2025-12-05T18:58:09Z) - Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback [41.41713036839503]
We introduce Edit-R1, a novel post-training framework for instruction-based image editing based on policy optimization.<n>We employ a Multimodal Large Language Model (MLLM) as a unified, training-free reward model, leveraging its output logits to provide fine-grained feedback.<n>Our framework is model-agnostic, delivering substantial performance gains when applied to diverse base models.
arXiv Detail & Related papers (2025-10-19T15:38:06Z) - Training-Free Reward-Guided Image Editing via Trajectory Optimal Control [55.64204232819136]
We introduce a novel framework for training-free, reward-guided image editing.<n>We demonstrate that our approach significantly outperforms existing inversion-based training-free baselines.
arXiv Detail & Related papers (2025-09-30T06:34:37Z) - The Promise of RL for Autoregressive Image Editing [26.91488709748245]
We explore three strategies to enhance performance on a wide range of image editing tasks.<n>We adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner.<n>We find RL combined with a large multi-modal LLM verifier to be the most effective of these strategies.
arXiv Detail & Related papers (2025-08-01T23:47:29Z) - Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models [1.9389881806157316]
In this work, we propose a novel framework that enhances image inversion using consistency models.<n>Our method introduces a cycle-consistency optimization strategy that significantly improves reconstruction accuracy.<n>We achieve state-of-the-art performance across various image editing tasks and datasets.
arXiv Detail & Related papers (2025-06-23T20:34:43Z) - Step1X-Edit: A Practical Framework for General Image Editing [64.07202539610576]
We release a state-of-the-art image editing model, called Step1X-Edit.<n>It can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash.<n>For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions.
arXiv Detail & Related papers (2025-04-24T17:25:12Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.<n>The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.<n>We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - HAF-RM: A Hybrid Alignment Framework for Reward Model Training [51.59246299566669]
We propose a hybrid alignment framework HaF-RM for reward model training.<n>It offers a principled and effective approach to enhancing the performance and alignment of reward models.
arXiv Detail & Related papers (2024-07-04T23:26:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.