ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
- URL: http://arxiv.org/abs/2503.14482v1
- Date: Tue, 18 Mar 2025 17:53:29 GMT
- Title: ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
- Authors: Yulin Pan, Xiangteng He, Chaojie Mao, Zhen Han, Zeyinzi Jiang, Jingfeng Zhang, Yu Liu,
- Abstract summary: ICE-Bench is a comprehensive benchmark designed to rigorously assess image generation models.<n>The evaluation framework assesses image generation capabilities across 6 dimensions.<n>We conduct a thorough analysis of existing generation models, revealing both the challenging nature of our benchmark and the gap between current model capabilities and real-world generation requirements.
- Score: 23.512687688393346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image generation has witnessed significant advancements in the past few years. However, evaluating the performance of image generation models remains a formidable challenge. In this paper, we propose ICE-Bench, a unified and comprehensive benchmark designed to rigorously assess image generation models. Its comprehensiveness could be summarized in the following key features: (1) Coarse-to-Fine Tasks: We systematically deconstruct image generation into four task categories: No-ref/Ref Image Creating/Editing, based on the presence or absence of source images and reference images. And further decompose them into 31 fine-grained tasks covering a broad spectrum of image generation requirements, culminating in a comprehensive benchmark. (2) Multi-dimensional Metrics: The evaluation framework assesses image generation capabilities across 6 dimensions: aesthetic quality, imaging quality, prompt following, source consistency, reference consistency, and controllability. 11 metrics are introduced to support the multi-dimensional evaluation. Notably, we introduce VLLM-QA, an innovative metric designed to assess the success of image editing by leveraging large models. (3) Hybrid Data: The data comes from real scenes and virtual generation, which effectively improves data diversity and alleviates the bias problem in model evaluation. Through ICE-Bench, we conduct a thorough analysis of existing generation models, revealing both the challenging nature of our benchmark and the gap between current model capabilities and real-world generation requirements. To foster further advancements in the field, we will open-source ICE-Bench, including its dataset, evaluation code, and models, thereby providing a valuable resource for the research community.
Related papers
- Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models [52.73820275861131]
Text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation.<n>Recent models such as FLUX.1 and Ideogram2.0 have demonstrated exceptional performance across various complex tasks.<n>This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability.
arXiv Detail & Related papers (2025-01-23T18:58:33Z) - Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Fashion Image-to-Image Translation for Complementary Item Retrieval [13.88174783842901]
We introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation.
Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-08-19T09:50:20Z) - ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection [0.0]
We introduce ImagiNet, a dataset of 200K examples spanning four categories: photos, paintings, faces, and miscellaneous.<n>Synthetic images in ImagiNet are produced with both open-source and proprietary generators, whereas real counterparts for each content type are collected from public datasets.
arXiv Detail & Related papers (2024-07-29T13:57:24Z) - Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark [63.296342841358815]
Large Multimodal Models (LMMs) have made significant strides in visual question-answering for single images.<n>The ability to process a large number of visual tokens does not guarantee effective retrieval and reasoning for multi-image question answering.<n>We introduce MIRAGE, an open-source, lightweight visual-RAG framework that processes up to 10k images on a single 40G A100 GPU.
arXiv Detail & Related papers (2024-07-18T17:59:30Z) - TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models [96.72318842152148]
We propose a unified framework for text-to-image generation and retrieval with one single Large Multimodal Model (LMM)
Specifically, we first explore the intrinsic discriminative abilities of LMMs and introduce an efficient generative retrieval method for text-to-image retrieval in a training-free manner.
We then propose an autonomous decision mechanism to choose the best-matched one between generated and retrieved images as the response to the text prompt.
arXiv Detail & Related papers (2024-06-09T15:00:28Z) - Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution [23.974575820244944]
In this work, we study the origin attribution of generated images in a practical setting.
The goal is to check if a given image is generated by the source model.
We propose OCC-CLIP, a CLIP-based framework for few-shot one-class classification.
arXiv Detail & Related papers (2024-04-03T12:54:16Z) - Benchmarking Counterfactual Image Generation [22.573830532174956]
Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos.<n>To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships.<n>We present a comparison framework to thoroughly benchmark counterfactual image generation methods.
arXiv Detail & Related papers (2024-03-29T16:58:13Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - GIQA: Generated Image Quality Assessment [36.01759301994946]
Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect.
We propose Generated Image Quality Assessment (GIQA), which quantitatively evaluates the quality of each generated image.
arXiv Detail & Related papers (2020-03-19T17:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.