ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images
- URL: http://arxiv.org/abs/2409.11874v1
- Date: Wed, 18 Sep 2024 11:04:35 GMT
- Title: ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images
- Authors: Abhinaw Jagtap, Nachiket Tapas, R. G. Brajesh,
- Abstract summary: We introduce a novel evaluation matrix designed explicitly for quantifying the performance of text and typography generation within AI-generated images.
Our novel approach to calculate the score takes care of multiple redundancies such as repetition of words, case sensitivity, mixing of words, irregular incorporation of letters etc.
- Score: 0.44241702149260337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the fast-evolving field of Generative AI, platforms like MidJourney, DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation. However, despite their impressive ability to create high-quality images, they often struggle to generate accurate text within these images. Theoretically, if we could achieve accurate text generation in AI images in a ``zero-shot'' manner, it would not only make AI-generated images more meaningful but also democratize the graphic design industry. The first step towards this goal is to create a robust scoring matrix for evaluating text accuracy in AI-generated images. Although there are existing bench-marking methods like CLIP SCORE and T2I-CompBench++, there's still a gap in systematically evaluating text and typography in AI-generated images, especially with diffusion-based methods. In this paper, we introduce a novel evaluation matrix designed explicitly for quantifying the performance of text and typography generation within AI-generated images. We have used letter by letter matching strategy to compute the exact matching scores from the reference text to the AI generated text. Our novel approach to calculate the score takes care of multiple redundancies such as repetition of words, case sensitivity, mixing of words, irregular incorporation of letters etc. Moreover, we have developed a Novel method named as brevity adjustment to handle excess text. In addition we have also done a quantitative analysis of frequent errors arise due to frequently used words and less frequently used words. Project page is available at: https://github.com/Abhinaw3906/ABHINAW-MATRIX.
Related papers
- Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models [16.00576040281808]
We propose a novel framework called Image2Text2Image to evaluate image captioning models.
A high similarity score suggests that the model has produced a faithful textual description, while a low score highlights discrepancies.
Our framework does not rely on human-annotated captions reference, making it a valuable tool for assessing image captioning models.
arXiv Detail & Related papers (2024-11-08T17:07:01Z) - A Sanity Check for AI-generated Image Detection [49.08585395873425]
We present a sanity check on whether the task of AI-generated image detection has been solved.
To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset.
We propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns.
arXiv Detail & Related papers (2024-06-27T17:59:49Z) - GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation [103.3465421081531]
VQAScore is a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt.
Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward.
We release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt.
arXiv Detail & Related papers (2024-06-19T18:00:07Z) - Evaluating Text-to-Visual Generation with Image-to-Text Generation [113.07368313330994]
VQAScore is a visual-question-answering (VQA) model to produce an alignment score.
It produces state-of-the-art results across many (8) image-text alignment benchmarks.
We introduce GenAI-Bench, a more challenging benchmark with 1,600 compositional text prompts.
arXiv Detail & Related papers (2024-04-01T17:58:06Z) - TIER: Text-Image Encoder-based Regression for AIGC Image Quality
Assessment [2.59079758388817]
In AIGCIQA tasks, images are typically generated by generative models using text prompts.
Most existing AIGCIQA methods regress predicted scores directly from individual generated images.
We propose a text-image encoder-based regression (TIER) framework to address this issue.
arXiv Detail & Related papers (2024-01-08T12:35:15Z) - Learning to Generate Semantic Layouts for Higher Text-Image
Correspondence in Text-to-Image Synthesis [37.32270579534541]
We propose a novel approach for enhancing text-image correspondence by leveraging available semantic layouts.
Our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset.
arXiv Detail & Related papers (2023-08-16T05:59:33Z) - TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
with Question Answering [86.38098280689027]
We introduce an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA)
We present a comprehensive evaluation of existing text-to-image models using a benchmark consisting of 4K diverse text inputs and 25K questions across 12 categories (object, counting, etc.)
arXiv Detail & Related papers (2023-03-21T14:41:02Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.