OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
- URL: http://arxiv.org/abs/2506.07977v3
- Date: Thu, 26 Jun 2025 15:47:09 GMT
- Title: OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
- Authors: Jingjing Chang, Yixiao Fang, Peng Xing, Shuhan Wu, Wei Cheng, Rui Wang, Xianfang Zeng, Gang Yu, Hai-Bao Chen,
- Abstract summary: Text-to-image (T2I) models have garnered significant attention for generating high-quality images aligned with text prompts.<n>OneIG-Bench is a benchmark framework for evaluation of T2I models across multiple dimensions.
- Score: 23.05106664412349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) models have garnered significant attention for generating high-quality images aligned with text prompts. However, rapid T2I model advancements reveal limitations in early benchmarks, lacking comprehensive evaluations, for example, the evaluation on reasoning, text rendering and style. Notably, recent state-of-the-art models, with their rich knowledge modeling capabilities, show promising results on the image generation problems requiring strong reasoning ability, yet existing evaluation systems have not adequately addressed this frontier. To systematically address these gaps, we introduce OneIG-Bench, a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including prompt-image alignment, text rendering precision, reasoning-generated content, stylization, and diversity. By structuring the evaluation, this benchmark enables in-depth analysis of model performance, helping researchers and practitioners pinpoint strengths and bottlenecks in the full pipeline of image generation. Specifically, OneIG-Bench enables flexible evaluation by allowing users to focus on a particular evaluation subset. Instead of generating images for the entire set of prompts, users can generate images only for the prompts associated with the selected dimension and complete the corresponding evaluation accordingly. Our codebase and dataset are now publicly available to facilitate reproducible evaluation studies and cross-model comparisons within the T2I research community.
Related papers
- Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation [9.978181430065987]
LongBench-T2I is a benchmark for evaluating text-to-image (T2I) models under complex instructions.<n>LongBench-T2I consists of 500 intricately designed prompts spanning nine diverse visual evaluation dimensions.<n>Plan2Gen is a framework that facilitates complex instruction-driven image generation without requiring additional model training.
arXiv Detail & Related papers (2025-05-30T16:48:14Z) - IA-T2I: Internet-Augmented Text-to-Image Generation [13.765327654914199]
Current text-to-image (T2I) generation models achieve promising results, but they fail on the scenarios where the knowledge implied in the text prompt is uncertain.<n>We propose an Internet-Augmented text-to-image generation (IA-T2I) framework to compel T2I models clear about such uncertain knowledge by providing them with reference images.
arXiv Detail & Related papers (2025-05-21T17:31:49Z) - Multi-Modal Language Models as Text-to-Image Model Evaluators [16.675735328424786]
Multimodal Text-to-Image Eval (MT2IE) is an evaluation framework that iteratively generates prompts for evaluation.<n>We show that MT2IE's prompt-generation consistency scores have higher correlation with human judgment than scores previously introduced in the literature.
arXiv Detail & Related papers (2025-05-01T17:47:55Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [86.69947123512836]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation [29.176750442205325]
In this study, we contribute an EvalMuse-40K benchmark, gathering 40K image-text pairs with fine-grained human annotations for image-text alignment-related tasks.<n>We introduce two new methods to evaluate the image-text alignment capabilities of T2I models.
arXiv Detail & Related papers (2024-12-24T04:08:25Z) - Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment [53.45813302866466]
We present ISG, a comprehensive evaluation framework for interleaved text-and-image generation.<n>ISG evaluates responses on four levels of granularity: holistic, structural, block-level, and image-specific.<n>In conjunction with ISG, we introduce a benchmark, ISG-Bench, encompassing 1,150 samples across 8 categories and 21 subcategories.
arXiv Detail & Related papers (2024-11-26T07:55:57Z) - Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [50.33699462106502]
Text-to-image (T2I) models frequently fail to produce images consistent with physical commonsense.
Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge.
We introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties.
arXiv Detail & Related papers (2024-06-17T17:49:01Z) - T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation [55.16845189272573]
T2I-CompBench++ is an enhanced benchmark for compositional text-to-image generation.<n>It comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions.
arXiv Detail & Related papers (2023-07-12T17:59:42Z) - LeftRefill: Filling Right Canvas based on Left Reference through
Generalized Text-to-Image Diffusion Model [55.20469538848806]
LeftRefill is an innovative approach to harness large Text-to-Image (T2I) diffusion models for reference-guided image synthesis.
This paper introduces LeftRefill, an innovative approach to efficiently harness large Text-to-Image (T2I) diffusion models for reference-guided image synthesis.
arXiv Detail & Related papers (2023-05-19T10:29:42Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.