LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and
Unlabeled Image Collections
- URL: http://arxiv.org/abs/2305.18287v2
- Date: Mon, 23 Oct 2023 12:32:47 GMT
- Title: LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and
Unlabeled Image Collections
- Authors: M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Mateusz Kozinski, Horst
Possegger, Rogerio Feris, Horst Bischof
- Abstract summary: Large-scale pre-trained Vision and Language (VL) models have set a new state-of-the-art (SOTA) in zero-shot visual classification.
We show, for the first time, how to reduce this gap without any labels and without any paired VL data.
- Score: 30.875186985461063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, large-scale pre-trained Vision and Language (VL) models have set a
new state-of-the-art (SOTA) in zero-shot visual classification enabling
open-vocabulary recognition of potentially unlimited set of categories defined
as simple language prompts. However, despite these great advances, the
performance of these zeroshot classifiers still falls short of the results of
dedicated (closed category set) classifiers trained with supervised fine
tuning. In this paper we show, for the first time, how to reduce this gap
without any labels and without any paired VL data, using an unlabeled image
collection and a set of texts auto-generated using a Large Language Model (LLM)
describing the categories of interest and effectively substituting labeled
visual instances of those categories. Using our label-free approach, we are
able to attain significant performance improvements over the zero-shot
performance of the base VL model and other contemporary methods and baselines
on a wide variety of datasets, demonstrating absolute improvement of up to
11.7% (3.8% on average) in the label-free setting. Moreover, despite our
approach being label-free, we observe 1.3% average gains over leading few-shot
prompting baselines that do use 5-shot supervision.
Related papers
- Label Propagation for Zero-shot Classification with Vision-Language Models [17.50253820510074]
In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data.
We introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification.
We perform extensive experiments to evaluate the effectiveness of our method on 14 common datasets and show that ZLaP outperforms the latest related works.
arXiv Detail & Related papers (2024-04-05T12:58:07Z) - LLM meets Vision-Language Models for Zero-Shot One-Class Classification [4.094697851983375]
We consider the problem of zero-shot one-class visual classification.
We propose a two-step solution that first queries large language models for visually confusing objects.
We are the first to demonstrate the ability to discriminate a single category from other semantically related ones using only its label.
arXiv Detail & Related papers (2024-03-31T12:48:07Z) - Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions [24.596929878045568]
We develop methods to train vision-language models (VLMs) with "bag-level" image-text supervision.
We use descriptions of categories generated by large language models (LLMs) and abundant, fine-grained image classification datasets.
Our findings suggest that geographic priors can be just as effective and are complementary to visual appearance.
arXiv Detail & Related papers (2024-01-04T08:39:13Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Zero-Shot Text Classification with Self-Training [8.68603153534916]
We show that fine-tuning the zero-shot classifier on its most confident predictions leads to significant performance gains across a wide range of text classification tasks.
Self-training adapts the zero-shot model to the task at hand.
arXiv Detail & Related papers (2022-10-31T17:55:00Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Generalized Category Discovery [148.32255950504182]
We consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set.
Here, the unlabelled images may come from labelled classes or from novel ones.
We first establish strong baselines by taking state-of-the-art algorithms from novel category discovery and adapting them for this task.
We then introduce a simple yet effective semi-supervised $k$-means method to cluster the unlabelled data into seen and unseen classes.
arXiv Detail & Related papers (2022-01-07T18:58:35Z) - AutoNovel: Automatically Discovering and Learning Novel Visual
Categories [138.80332861066287]
We present a new approach called AutoNovel to tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We evaluate AutoNovel on standard classification benchmarks and substantially outperform current methods for novel category discovery.
arXiv Detail & Related papers (2021-06-29T11:12:16Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.