I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
- URL: http://arxiv.org/abs/2408.14180v2
- Date: Fri, 27 Sep 2024 13:12:04 GMT
- Title: I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
- Authors: Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji,
- Abstract summary: We propose I2EBench, a comprehensive benchmark to evaluate the quality of edited images produced by IIE models.
I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions.
We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models.
- Score: 67.05794909694649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench, a comprehensive benchmark designed to automatically evaluate the quality of edited images produced by IIE models from multiple dimensions. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. It offers three distinctive characteristics: 1) Comprehensive Evaluation Dimensions: I2EBench comprises 16 evaluation dimensions that cover both high-level and low-level aspects, providing a comprehensive assessment of each IIE model. 2) Human Perception Alignment: To ensure the alignment of our benchmark with human perception, we conducted an extensive user study for each evaluation dimension. 3) Valuable Research Insights: By analyzing the advantages and disadvantages of existing IIE models across the 16 dimensions, we offer valuable research insights to guide future development in the field. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models. The code, dataset and generated images from all IIE models are provided in github: https://github.com/cocoshe/I2EBench.
Related papers
- A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) refers to the process of improving visual quality of images while removing degradation, such as noise, blur, weather effects, and so on.
Traditional IR methods typically target specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.
The all-in-one image restoration (AiOIR) paradigm has emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM [17.89238060470998]
evaluating diffusion-based image-editing models is a crucial task in the field of Generative AI.
Our benchmark, PixLens, provides a comprehensive evaluation of both edit quality and latent representation disentanglement.
arXiv Detail & Related papers (2024-10-08T06:05:15Z) - SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation [6.21476985578569]
Out-of-Distribution (OOD) detection in computer vision is a crucial research area.
SOOD-ImageNet is a novel dataset comprising around 1.6M images across 56 classes.
It is designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions.
arXiv Detail & Related papers (2024-09-02T09:37:39Z) - Learning Action and Reasoning-Centric Image Editing from Videos and Simulations [45.637947364341436]
AURORA dataset is a collection of high-quality training data, human-annotated and curated from videos and simulation engines.
We evaluate an AURORA-finetuned model on a new expert-curated benchmark covering 8 diverse editing tasks.
Our model significantly outperforms previous editing models as judged by human raters.
arXiv Detail & Related papers (2024-07-03T19:36:33Z) - PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [50.33699462106502]
Text-to-image (T2I) models frequently fail to produce images consistent with physical commonsense.
Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge.
We introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties.
arXiv Detail & Related papers (2024-06-17T17:49:01Z) - A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning [51.7818820745221]
Underwater image enhancement (UIE) presents a significant challenge within computer vision research.
Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent.
arXiv Detail & Related papers (2024-05-30T04:46:40Z) - AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images [70.42666704072964]
We establish a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024.
A subjective IQA experiment is conducted to assess human visual preferences from three perspectives.
We conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database.
arXiv Detail & Related papers (2024-04-01T10:08:23Z) - AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated
by AI [1.1035305628305816]
This paper introduces AIGCBench, a pioneering comprehensive benchmark designed to evaluate a variety of video generation tasks.
A varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions.
We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models.
arXiv Detail & Related papers (2024-01-03T10:08:40Z) - Holistic Evaluation of Text-To-Image Models [153.47415461488097]
We introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM)
We identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency.
Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths.
arXiv Detail & Related papers (2023-11-07T19:00:56Z) - AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI
Generated Images: from the Perspectives of Quality, Authenticity and
Correspondence [42.85549933048976]
We first generate over 2000 images based on 6 state-of-the-art text-to-image generation models using 100 prompts.
Based on these images, a subjective experiment is conducted to assess the human visual preferences for each image from three perspectives.
We conduct a benchmark experiment to evaluate the performance of several state-of-the-art IQA metrics on our constructed database.
arXiv Detail & Related papers (2023-07-01T03:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.