A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A
Study with Unified Text-to-Image Fidelity Metrics
- URL: http://arxiv.org/abs/2312.02338v2
- Date: Mon, 11 Dec 2023 07:58:36 GMT
- Title: A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A
Study with Unified Text-to-Image Fidelity Metrics
- Authors: Xiangru Zhu, Penglei Sun, Chengyu Wang, Jingping Liu, Zhixu Li,
Yanghua Xiao, Jun Huang
- Abstract summary: We introduce Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I models.
This benchmark includes 11K complex, high-quality contrastive sentence pairs spanning 20 categories.
We use Winoground-T2I with a dual objective: to evaluate the performance of T2I models and the metrics used for their evaluation.
- Score: 58.83242220266935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) synthesis has recently achieved significant advancements.
However, challenges remain in the model's compositionality, which is the
ability to create new combinations from known components. We introduce
Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I
models. This benchmark includes 11K complex, high-quality contrastive sentence
pairs spanning 20 categories. These contrastive sentence pairs with subtle
differences enable fine-grained evaluations of T2I synthesis models.
Additionally, to address the inconsistency across different metrics, we propose
a strategy that evaluates the reliability of various metrics by using
comparative sentence pairs. We use Winoground-T2I with a dual objective: to
evaluate the performance of T2I models and the metrics used for their
evaluation. Finally, we provide insights into the strengths and weaknesses of
these metrics and the capabilities of current T2I models in tackling challenges
across a range of complex compositional categories. Our benchmark is publicly
available at https://github.com/zhuxiangru/Winoground-T2I .
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation [55.57459883629706]
We conduct the first systematic study on compositional text-to-video generation.
We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings [31.34775554251813]
We introduce a skills-based benchmark that can discriminate models across different human templates.
We gather human ratings across four templates and four T2I models for a total of >100K annotations.
We introduce a new QA-based auto-eval metric that is better correlated with human ratings than existing metrics.
arXiv Detail & Related papers (2024-04-25T17:58:43Z) - Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) [62.44395685571094]
We introduce T2IScoreScore, a curated set of semantic error graphs containing a prompt and a set of increasingly erroneous images.
These allow us to rigorously judge whether a given prompt faithfulness metric can correctly order images with respect to their objective error count.
We find that the state-of-the-art VLM-based metrics fail to significantly outperform simple (and supposedly worse) feature-based metrics like CLIPScore.
arXiv Detail & Related papers (2024-04-05T17:57:16Z) - T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional
Text-to-image Generation [62.71574695256264]
T2I-CompBench is a comprehensive benchmark for open-world compositional text-to-image generation.
We propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation.
We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS) to boost the compositional text-to-image generation abilities.
arXiv Detail & Related papers (2023-07-12T17:59:42Z) - HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image
Models [39.38477117444303]
HRS-Bench is an evaluation benchmark for Text-to-Image (T2I) models.
It measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias.
It covers 50 scenarios, including fashion, animals, transportation, food, and clothes.
arXiv Detail & Related papers (2023-04-11T17:59:13Z) - StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis [52.341186561026724]
Lacking compositionality could have severe implications for robustness and fairness.
We introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis.
Results show that StyleT2I outperforms previous approaches in terms of consistency between the input text and synthesized images.
arXiv Detail & Related papers (2022-03-29T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.