Taxonomy-Aware Evaluation of Vision-Language Models
- URL: http://arxiv.org/abs/2504.05457v1
- Date: Mon, 07 Apr 2025 19:46:59 GMT
- Title: Taxonomy-Aware Evaluation of Vision-Language Models
- Authors: Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr, Serge Belongie, Ryan Cotterell, Nico Lang, Stella Frank,
- Abstract summary: We propose a framework for evaluating unconstrained text predictions, such as those generated from a vision-language model, against a taxonomy.<n> Specifically, we propose the use of hierarchical precision and recall measures to assess the level of correctness and specificity of predictions with regard to a taxonomy.
- Score: 48.285819827561625
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: When a vision-language model (VLM) is prompted to identify an entity depicted in an image, it may answer 'I see a conifer,' rather than the specific label 'norway spruce'. This raises two issues for evaluation: First, the unconstrained generated text needs to be mapped to the evaluation label space (i.e., 'conifer'). Second, a useful classification measure should give partial credit to less-specific, but not incorrect, answers ('norway spruce' being a type of 'conifer'). To meet these requirements, we propose a framework for evaluating unconstrained text predictions, such as those generated from a vision-language model, against a taxonomy. Specifically, we propose the use of hierarchical precision and recall measures to assess the level of correctness and specificity of predictions with regard to a taxonomy. Experimentally, we first show that existing text similarity measures do not capture taxonomic similarity well. We then develop and compare different methods to map textual VLM predictions onto a taxonomy. This allows us to compute hierarchical similarity measures between the generated text and the ground truth labels. Finally, we analyze modern VLMs on fine-grained visual classification tasks based on our proposed taxonomic evaluation scheme.
Related papers
- Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark [63.97125827026949]
This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts.<n>A benchmark is proposed that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images.<n>The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback.
arXiv Detail & Related papers (2025-03-13T13:37:54Z) - A Novel Evaluation Framework for Image2Text Generation [15.10524860121122]
We propose an evaluation framework rooted in a modern large language model (LLM) capable of image generation.
A high similarity score suggests that the image captioning model has accurately generated textual descriptions.
A low similarity score indicates discrepancies, revealing potential shortcomings in the model's performance.
arXiv Detail & Related papers (2024-08-03T09:27:57Z) - LLM meets Vision-Language Models for Zero-Shot One-Class Classification [4.094697851983375]
We consider the problem of zero-shot one-class visual classification.
We propose a two-step solution that first queries large language models for visually confusing objects.
We are the first to demonstrate the ability to discriminate a single category from other semantically related ones using only its label.
arXiv Detail & Related papers (2024-03-31T12:48:07Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - An Explainable Model-Agnostic Algorithm for CNN-based Biometrics
Verification [55.28171619580959]
This paper describes an adaptation of the Local Interpretable Model-Agnostic Explanations (LIME) AI method to operate under a biometric verification setting.
arXiv Detail & Related papers (2023-07-25T11:51:14Z) - Analyzing Text Representations by Measuring Task Alignment [2.198430261120653]
We develop a task alignment score based on hierarchical clustering that measures alignment at different levels of granularity.
Our experiments on text classification validate our hypothesis by showing that task alignment can explain the classification performance of a given representation.
arXiv Detail & Related papers (2023-05-31T11:20:48Z) - A Multi-Grained Self-Interpretable Symbolic-Neural Model For
Single/Multi-Labeled Text Classification [29.075766631810595]
We propose a Symbolic-Neural model that can learn to explicitly predict class labels of text spans from a constituency tree.
As the structured language model learns to predict constituency trees in a self-supervised manner, only raw texts and sentence-level labels are required as training data.
Our experiments demonstrate that our approach could achieve good prediction accuracy in downstream tasks.
arXiv Detail & Related papers (2023-03-06T03:25:43Z) - Contrastive Semantic Similarity Learning for Image Captioning Evaluation
with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning.
We develop three progressive model structures to learn the sentence level representations.
Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.