Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark
- URL: http://arxiv.org/abs/2503.10357v1
- Date: Thu, 13 Mar 2025 13:37:54 GMT
- Title: Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark
- Authors: Viktor Moskvoretskii, Alina Lobanova, Ekaterina Neminova, Chris Biemann, Alexander Panchenko, Irina Nikishina,
- Abstract summary: This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts.<n>A benchmark is proposed that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images.<n>The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback.
- Score: 63.97125827026949
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts. While text-based methods for taxonomy enrichment are well-established, the potential of the visual dimension remains unexplored. To address this, we propose a comprehensive benchmark for Taxonomy Image Generation that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images. The benchmark includes common-sense and randomly sampled WordNet concepts, alongside the LLM generated predictions. The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback. Moreover, we pioneer the use of pairwise evaluation with GPT-4 feedback for image generation. Experimental results show that the ranking of models differs significantly from standard T2I tasks. Playground-v2 and FLUX consistently outperform across metrics and subsets and the retrieval-based approach performs poorly. These findings highlight the potential for automating the curation of structured data resources.
Related papers
- Taxonomy-Aware Evaluation of Vision-Language Models [48.285819827561625]
We propose a framework for evaluating unconstrained text predictions, such as those generated from a vision-language model, against a taxonomy.
Specifically, we propose the use of hierarchical precision and recall measures to assess the level of correctness and specificity of predictions with regard to a taxonomy.
arXiv Detail & Related papers (2025-04-07T19:46:59Z) - EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation [29.176750442205325]
In this study, we contribute an EvalMuse-40K benchmark, gathering 40K image-text pairs with fine-grained human annotations for image-text alignment-related tasks.<n>We introduce two new methods to evaluate the image-text alignment capabilities of T2I models.
arXiv Detail & Related papers (2024-12-24T04:08:25Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation [19.399688660643367]
XIMAGENET-12 consists of over 200K images with 15,410 manual semantic annotations.
We develop a quantitative criterion for robustness assessment, allowing for a nuanced understanding of how visual models perform under varying conditions.
arXiv Detail & Related papers (2023-10-12T10:17:40Z) - Towards Visual Taxonomy Expansion [50.462998483087915]
We propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion task.
We propose a textual hypernymy learning task and a visual prototype learning task to cluster textual and visual semantics.
Our method is evaluated on two datasets, where we obtain compelling results.
arXiv Detail & Related papers (2023-09-12T10:17:28Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Bringing motion taxonomies to continuous domains via GPLVM on hyperbolic manifolds [8.385386712928785]
Human motion serves as high-level hierarchical abstractions that classify how humans move and interact with their environment.
We propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure.
We show that our model properly encodes unseen data from existing or new taxonomy categories, and outperforms its Euclidean and VAE-based counterparts.
arXiv Detail & Related papers (2022-10-04T15:19:24Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Can Taxonomy Help? Improving Semantic Question Matching using Question
Taxonomy [37.57300969050908]
We propose a hybrid technique for semantic question matching.
It uses our proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question.
arXiv Detail & Related papers (2021-01-20T16:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.