Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual
Descriptions
- URL: http://arxiv.org/abs/2103.09669v1
- Date: Wed, 17 Mar 2021 14:06:56 GMT
- Title: Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual
Descriptions
- Authors: Sebastian Bujwid, Josephine Sullivan
- Abstract summary: We study the impact of using rich and diverse textual descriptions of classes for zero-shot learning (ZSL) on ImageNet.
We create a new dataset ImageNet-Wiki that matches each ImageNet class to its corresponding Wikipedia article.
We show that employing these Wikipedia articles as class descriptions yields much higher ZSL performance than prior works.
- Score: 5.3556221126231085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the impact of using rich and diverse textual descriptions of classes
for zero-shot learning (ZSL) on ImageNet. We create a new dataset ImageNet-Wiki
that matches each ImageNet class to its corresponding Wikipedia article. We
show that merely employing these Wikipedia articles as class descriptions
yields much higher ZSL performance than prior works. Even a simple model using
this type of auxiliary data outperforms state-of-the-art models that rely on
standard features of word embedding encodings of class names. These results
highlight the usefulness and importance of textual descriptions for ZSL, as
well as the relative importance of auxiliary data type compared to algorithmic
progress. Our experimental results also show that standard zero-shot learning
approaches generalize poorly across categories of classes.
Related papers
- Finetuning CLIP to Reason about Pairwise Differences [52.028073305958074]
We propose an approach to train vision-language models such as CLIP in a contrastive manner to reason about differences in embedding space.
We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute.
We also illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space.
arXiv Detail & Related papers (2024-09-15T13:02:14Z) - Data-Free Generalized Zero-Shot Learning [45.86614536578522]
We propose a generic framework for data-free zero-shot learning (DFZSL)
Our framework has been evaluated on five commonly used benchmarks for generalized ZSL, as well as 11 benchmarks for the base-to-new ZSL.
arXiv Detail & Related papers (2024-01-28T13:26:47Z) - Text Descriptions are Compressive and Invariant Representations for
Visual Learning [63.3464863723631]
We show that an alternative approach, in line with humans' understanding of multiple visual features per class, can provide compelling performance in the robust few-shot learning setting.
In particular, we introduce a novel method, textit SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors).
This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify
arXiv Detail & Related papers (2023-07-10T03:06:45Z) - I2MVFormer: Large Language Model Generated Multi-View Document
Supervision for Zero-Shot Image Classification [108.83932812826521]
Large Language Models (LLM) trained on web-scale text show impressive abilities to repurpose their learned knowledge for a multitude of tasks.
Our proposed model, I2MVFormer, learns multi-view semantic embeddings for zero-shot image classification with these class views.
I2MVFormer establishes a new state-of-the-art on three public benchmark datasets for zero-shot image classification with unsupervised semantic embeddings.
arXiv Detail & Related papers (2022-12-05T14:11:36Z) - Exploiting Category Names for Few-Shot Classification with
Vision-Language Models [78.51975804319149]
Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.
This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head.
arXiv Detail & Related papers (2022-11-29T21:08:46Z) - Self-Supervised Learning for Fine-Grained Image Classification [0.0]
Fine-grained datasets usually provide bounding box annotations along with class labels to aid the process of classification.
On the other hand, self-supervised learning exploits the freely available data to generate supervisory signals which act as labels.
Our idea is to leverage self-supervision such that the model learns useful representations of fine-grained image classes.
arXiv Detail & Related papers (2021-07-29T14:01:31Z) - Zero-shot Learning with Class Description Regularization [10.739164530098755]
We introduce a novel form of regularization that encourages generative ZSL models to pay more attention to the description of each category.
Our empirical results demonstrate improvements over the performance of multiple state-of-the-art models on the task of generalized zero-shot recognition and classification.
arXiv Detail & Related papers (2021-06-30T14:56:15Z) - Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning [48.583388368897126]
Few-shot learning is the task of learning to recognize previously unseen categories of images.
We propose a method that takes into account the names of the image classes.
arXiv Detail & Related papers (2021-05-21T08:08:28Z) - Rich Semantics Improve Few-shot Learning [49.11659525563236]
We show that by using 'class-level' language descriptions, that can be acquired with minimal annotation cost, we can improve the few-shot learning performance.
We develop a Transformer based forward and backward encoding mechanism to relate visual and semantic tokens.
arXiv Detail & Related papers (2021-04-26T16:48:27Z) - Zero-Shot Learning from scratch (ZFS): leveraging local compositional
representations [25.449244103599106]
Zero-shot classification is a generalization task where no instance from the target classes is seen during training.
To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or text descriptions.
The approaches that achieve the best absolute performance on image benchmarks rely on features extracted from encoders pretrained on Imagenet.
We propose Zero-Shot Learning from scratch (ZFS), which explicitly forbids the use of encoders fine-tuned on other datasets.
arXiv Detail & Related papers (2020-10-22T23:11:18Z) - Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning [8.472636806304273]
Zero-shot learning (ZSL) makes object recognition in images possible in absence of visual training data for a part of the classes from a dataset.
We focus on the problem of semantic class prototype design for large scale ZSL.
We investigate the use of noisy textual metadata associated to photos as text collections.
arXiv Detail & Related papers (2020-08-06T21:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.