Related papers: VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

URL: http://arxiv.org/abs/2209.06103v1
Date: Mon, 12 Sep 2022 15:43:09 GMT
Title: VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Authors: Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne
Abstract summary: Vision-language models trained on large, randomly collected data had significant impact in many areas since they appeared. But as they show great performance in various fields, such as image-text-retrieval, their inner workings are still not fully understood. We start from the analysis of the training corpus assessing to what extent (and which of) the test classes are really zero-shot. We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision.
Score: 17.00524909491351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language models trained on large, randomly collected data had significant impact in many areas since they appeared. But as they show great performance in various fields, such as image-text-retrieval, their inner workings are still not fully understood. The current work analyses the true zero-shot capabilities of those models. We start from the analysis of the training corpus assessing to what extent (and which of) the test classes are really zero-shot and how this correlates with individual classes performance. We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision. We leverage the recently released LAION400M data corpus as well as the publicly available pretrained models of CLIP, OpenCLIP, and FLAVA, evaluating the attribute-based zero-shot capabilities on CUB and AWA2 benchmarks. Our analysis shows that: (i) most of the classes in popular zero-shot benchmarks are observed (a lot) during pre-training; (ii) zero-shot performance mainly comes out of models' capability of recognizing class labels, whenever they are present in the text, and a significantly lower performing capability of attribute-based zeroshot learning is only observed when class labels are not used; (iii) the number of the attributes used can have a significant effect on performance, and can easily cause a significant performance decrease.

Related papers

Verbalized Representation Learning for Interpretable Few-Shot Generalization [130.8173035901391]
Verbalized Representation Learning (VRL) is a novel approach for automatically extracting human-interpretable features for object recognition. Our method captures inter-class differences and intra-class commonalities in the form of natural language. VRL achieves a 24% absolute improvement over prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T01:55:08Z)
Active Learning for Vision-Language Models [29.309503214127016]
We propose a novel active learning (AL) framework that enhances the zero-shot classification performance of vision-language models (VLMs) Our approach first calibrates the predicted entropy of VLMs and then utilizes a combination of self-uncertainty and neighbor-aware uncertainty to calculate a reliable uncertainty measure for active sample selection. Our experiments show that the proposed approach outperforms existing AL approaches on several image classification datasets.
arXiv Detail & Related papers (2024-10-29T16:25:50Z)
Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z)
Vision-language Assisted Attribute Learning [53.60196963381315]
Attribute labeling at large scale is typically incomplete and partial. Existing attribute learning methods often treat the missing labels as negative or simply ignore them all during training. We leverage the available vision-language knowledge to explicitly disclose the missing labels for enhancing model learning.
arXiv Detail & Related papers (2023-12-12T06:45:19Z)
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations [97.41375480696972]
We introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo-demonstrations for a given test input. evaluation on nine classification datasets shows that Z-ICL outperforms previous zero-shot methods by a significant margin.
arXiv Detail & Related papers (2022-12-19T21:34:26Z)
Zero-Shot Text Classification with Self-Training [8.68603153534916]
We show that fine-tuning the zero-shot classifier on its most confident predictions leads to significant performance gains across a wide range of text classification tasks. Self-training adapts the zero-shot model to the task at hand.
arXiv Detail & Related papers (2022-10-31T17:55:00Z)
A Thorough Examination on Zero-shot Dense Retrieval [84.70868940598143]
We present the first thorough examination of the zero-shot capability of dense retrieval (DR) models. We discuss the effect of several key factors related to source training set, analyze the potential bias from the target dataset, and review and compare existing zero-shot DR models.
arXiv Detail & Related papers (2022-04-27T07:59:07Z)
Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis [65.74825840440504]
We propose Zero Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge. Our proposed method is able to synthesize the detectors of novel attributes in a zero-shot learning manner. With using only 32 seen attributes on the Caltech-UCSD Birds-200-2011 dataset, our proposed method is able to synthesize other 207 novel attributes.
arXiv Detail & Related papers (2021-11-28T15:45:54Z)
Using Fictitious Class Representations to Boost Discriminative Zero-Shot Learners [23.854093182195246]
We introduce a novel mechanism that dynamically augments during training the set of seen classes to produce additional fictitious classes. These fictitious classes diminish the model's tendency to fixate during training on attribute correlations that appear in the training set but will not appear in newly exposed classes.
arXiv Detail & Related papers (2021-11-26T15:41:16Z)
A Systematic Investigation of Commonsense Understanding in Large Language Models [23.430757316504316]
Large language models have shown impressive performance on many natural language processing (NLP) tasks in a zero-shot setting. We ask whether these models exhibit commonsense understanding by evaluating models against four commonsense benchmarks.
arXiv Detail & Related papers (2021-10-31T22:20:36Z)
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually. We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.