Related papers: EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

URL: http://arxiv.org/abs/2509.26346v1
Date: Tue, 30 Sep 2025 14:51:04 GMT
Title: EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Authors: Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, Wenhu Chen,
Abstract summary: mname demonstrates superior alignment with human preferences in instruction-guided image editing tasks.<n>mname achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new benchname.<n>mname with its training dataset will be released to help the community build more high-quality image editing training datasets.
Score: 43.239693852521185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built \mname, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. \mname demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that \mname achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new \benchname, outperforming a wide range of VLM-as-judge models. Furthermore, we use \mname to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates \mname's ability to serve as a reward model to scale up high-quality training data for image editing. Furthermore, its strong alignment suggests potential for advanced applications like reinforcement learning-based post-training and test-time scaling of image editing models. \mname with its training dataset will be released to help the community build more high-quality image editing training datasets.

Related papers

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency [21.27005111847166]
Current text-to-image generative models are trained on large uncurated datasets.<n>We propose to condition the model on multiple reward models during training to let the model learn user preferences directly.
arXiv Detail & Related papers (2025-10-29T18:59:17Z)
Learning an Image Editing Model without Image Editing Pairs [83.03646586929638]
Recent image editing models have achieved impressive results while following natural language editing instructions.<n>They rely on supervised fine-tuning with large datasets of input-target pairs.<n>Current workarounds use synthetic training pairs that leverage the zero-shot capabilities of existing models.<n>We present a new training paradigm that eliminates the need for paired data entirely.
arXiv Detail & Related papers (2025-10-16T17:59:57Z)
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation [61.31036260686349]
We propose a novel prompt optimization framework, designed to rephrase a simple user prompt into a sophisticated prompt to a text-to-image model.<n> Specifically, we employ the large vision language models (LVLMs) as the solver to rewrite the user prompt, and concurrently, employ LVLMs as a reward model to score the aesthetics and alignment of the images generated by the optimized prompt.<n>Instead of laborious human feedback, we exploit the prior knowledge of the LVLM to provide rewards, i.e., AI feedback.
arXiv Detail & Related papers (2025-05-22T15:05:07Z)
DreamOmni: Unified Image Generation and Editing [76.46811926046225]
We introduce Dream Omni, a unified model for image generation and editing.<n>For training, Dream Omni jointly trains T2I generation and downstream tasks.<n>This collaboration significantly boosts editing performance.
arXiv Detail & Related papers (2024-12-22T17:17:28Z)
Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z)
MOFI: Learning Image Representations from Noisy Entity Annotated Images [47.6984817573981]
We present MOFI, a new vision foundation model designed to learn image representations from noisy entity annotated images. We introduce a new approach to automatically assign entity labels to images from noisy image-text pairs. Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.
arXiv Detail & Related papers (2023-06-13T17:51:18Z)
InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model. This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.