T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional
Text-to-image Generation
- URL: http://arxiv.org/abs/2307.06350v2
- Date: Mon, 30 Oct 2023 11:42:42 GMT
- Title: T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional
Text-to-image Generation
- Authors: Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu
- Abstract summary: T2I-CompBench is a comprehensive benchmark for open-world compositional text-to-image generation.
We propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation.
We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS) to boost the compositional text-to-image generation abilities.
- Score: 62.71574695256264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the stunning ability to generate high-quality images by recent
text-to-image models, current approaches often struggle to effectively compose
objects with different attributes and relationships into a complex and coherent
scene. We propose T2I-CompBench, a comprehensive benchmark for open-world
compositional text-to-image generation, consisting of 6,000 compositional text
prompts from 3 categories (attribute binding, object relationships, and complex
compositions) and 6 sub-categories (color binding, shape binding, texture
binding, spatial relationships, non-spatial relationships, and complex
compositions). We further propose several evaluation metrics specifically
designed to evaluate compositional text-to-image generation and explore the
potential and limitations of multimodal LLMs for evaluation. We introduce a new
approach, Generative mOdel fine-tuning with Reward-driven Sample selection
(GORS), to boost the compositional text-to-image generation abilities of
pretrained text-to-image models. Extensive experiments and evaluations are
conducted to benchmark previous methods on T2I-CompBench, and to validate the
effectiveness of our proposed evaluation metrics and GORS approach. Project
page is available at https://karine-h.github.io/T2I-CompBench/.
Related papers
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation [55.57459883629706]
We conduct the first systematic study on compositional text-to-video generation.
We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark.
FineMatch focuses on text and image mismatch detection and correction.
We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z) - Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation [5.55027585813848]
The capability to generate visual text is crucial, offering both academic interest and a wide range of practical applications.
We introduce a benchmark, LenCom-Eval, specifically designed for testing models' capability in generating images with Lengthy and Complex visual text.
We demonstrate notable improvements across a range of evaluation metrics, including CLIPScore, OCR precision, recall, F1 score, accuracy, and edit distance scores.
arXiv Detail & Related papers (2024-03-25T04:54:49Z) - Divide and Conquer: Language Models can Plan and Self-Correct for
Compositional Text-to-Image Generation [72.6168579583414]
CompAgent is a training-free approach for compositional text-to-image generation with a large language model (LLM) agent as its core.
Our approach achieves more than 10% improvement on T2I-CompBench, a comprehensive benchmark for open-world compositional T2I generation.
arXiv Detail & Related papers (2024-01-28T16:18:39Z) - A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A
Study with Unified Text-to-Image Fidelity Metrics [58.83242220266935]
We introduce Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I models.
This benchmark includes 11K complex, high-quality contrastive sentence pairs spanning 20 categories.
We use Winoground-T2I with a dual objective: to evaluate the performance of T2I models and the metrics used for their evaluation.
arXiv Detail & Related papers (2023-12-04T20:47:48Z) - OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation [151.57313182844936]
We propose a new interleaved generation framework based on prompting large-language models (LLMs) and pre-trained text-to-image (T2I) models, namely OpenLEAF.
For model assessment, we first propose to use large multi-modal models (LMMs) to evaluate the entity and style consistencies of open-domain interleaved image-text sequences.
arXiv Detail & Related papers (2023-10-11T17:58:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.