Related papers: Towards Scalable Human-aligned Benchmark for Text-guided Image Editing

Towards Scalable Human-aligned Benchmark for Text-guided Image Editing

URL: http://arxiv.org/abs/2505.00502v1
Date: Thu, 01 May 2025 13:06:05 GMT
Title: Towards Scalable Human-aligned Benchmark for Text-guided Image Editing
Authors: Suho Ryu, Kihyun Kim, Eugene Baek, Dongsoo Shin, Joonseok Lee,
Abstract summary: We introduce a novel Human-Aligned benchmark for Text-guided Image Editing (HATIE)<n>HATIE provides a fully-automated and omnidirectional evaluation pipeline.<n>We empirically verify that the evaluation of HATIE is indeed human-aligned in various aspects.
Score: 9.899869794429579
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A variety of text-guided image editing models have been proposed recently. However, there is no widely-accepted standard evaluation method mainly due to the subjective nature of the task, letting researchers rely on manual user study. To address this, we introduce a novel Human-Aligned benchmark for Text-guided Image Editing (HATIE). Providing a large-scale benchmark set covering a wide range of editing tasks, it allows reliable evaluation, not limited to specific easy-to-evaluate cases. Also, HATIE provides a fully-automated and omnidirectional evaluation pipeline. Particularly, we combine multiple scores measuring various aspects of editing so as to align with human perception. We empirically verify that the evaluation of HATIE is indeed human-aligned in various aspects, and provide benchmark results on several state-of-the-art models to provide deeper insights on their performance.

Related papers

EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits [22.762414256693265]
We introduce EditInspector, a novel benchmark for evaluation of text-guided image edits.<n>We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits.<n>Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes.
arXiv Detail & Related papers (2025-06-11T17:58:25Z)
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models [88.398085358514]
DICE is a model designed to detect localized differences between the original and the edited image.<n>It is trained using a strategy that leverages self-supervision, distillation from inpainting networks, and full supervision.<n>We demonstrate that DICE effectively identifies coherent edits, effectively evaluating images generated by different editing models with a strong correlation with human judgment.
arXiv Detail & Related papers (2025-05-26T18:00:10Z)
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing [60.66800567924348]
We introduce a new benchmark designed to evaluate text-guided image editing models.<n>The benchmark includes over 1000 high-quality editing examples across 20 diverse content categories.<n>We conduct a large-scale study comparing GPT-Image-1 against several state-of-the-art editing models.
arXiv Detail & Related papers (2025-05-16T17:55:54Z)
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance [8.216807467478281]
evaluating text-to-image synthesis is challenging due to misalignment between established metrics and human preferences.<n>We propose cFreD, a metric that accounts for both visual fidelity and text-prompt alignment.<n>Our findings validate cFreD as a robust, future-proof metric for the systematic evaluation of text-to-image models.
arXiv Detail & Related papers (2025-03-27T17:35:14Z)
Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts [23.48816491333345]
textbfGamma, a textbfGeneric imtextbfAge assesstextbfMent model, can effectively assess images from diverse scenes through mixed-dataset training.<n>Our Gamma model is trained and evaluated on 12 datasets spanning 6 image assessment scenarios.
arXiv Detail & Related papers (2025-03-09T16:07:58Z)
Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias [52.590072198551944]
The aim of image personalization is to create images based on a user-provided subject.<n>Current methods face challenges in ensuring fidelity to the text prompt.<n>We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
arXiv Detail & Related papers (2025-03-09T14:14:02Z)
Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation. Our approach can be applied to existing datasets by automatically generating hard negative test captions. Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z)
PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM [17.89238060470998]
evaluating diffusion-based image-editing models is a crucial task in the field of Generative AI. Our benchmark, PixLens, provides a comprehensive evaluation of both edit quality and latent representation disentanglement.
arXiv Detail & Related papers (2024-10-08T06:05:15Z)
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing [67.05794909694649]
We propose I2EBench, a comprehensive benchmark to evaluate the quality of edited images produced by IIE models. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models.
arXiv Detail & Related papers (2024-08-26T11:08:44Z)
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods [52.43439659492655]
We introduce EditVal, a standardized benchmark for quantitatively evaluating text-guided image editing methods. EditVal consists of a curated dataset of images, a set of editable attributes for each image drawn from 13 possible edit types, and an automated evaluation pipeline. We use EditVal to benchmark 8 cutting-edge diffusion-based editing methods including SINE, Imagic and Instruct-Pix2Pix.
arXiv Detail & Related papers (2023-10-03T20:46:10Z)
HIVE: Harnessing Human Feedback for Instructional Visual Editing [127.29436858998064]
We present a novel framework to harness human feedback for instructional visual editing (HIVE) Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences. We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward.
arXiv Detail & Related papers (2023-03-16T19:47:41Z)
TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models [1.1252184947601962]
evaluating and comparing text-to-image models is a challenging problem. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score.
arXiv Detail & Related papers (2022-12-15T13:52:03Z)
EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities. We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA. Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.