Related papers: Zero-Shot Learning from scratch (ZFS): leveraging local compositional representations

Zero-Shot Learning from scratch (ZFS): leveraging local compositional representations

URL: http://arxiv.org/abs/2010.13320v1
Date: Thu, 22 Oct 2020 23:11:18 GMT
Title: Zero-Shot Learning from scratch (ZFS): leveraging local compositional representations
Authors: Tristan Sylvain, Linda Petrini, R Devon Hjelm
Abstract summary: Zero-shot classification is a generalization task where no instance from the target classes is seen during training. To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or text descriptions. The approaches that achieve the best absolute performance on image benchmarks rely on features extracted from encoders pretrained on Imagenet. We propose Zero-Shot Learning from scratch (ZFS), which explicitly forbids the use of encoders fine-tuned on other datasets.
Score: 25.449244103599106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot classification is a generalization task where no instance from the target classes is seen during training. To allow for test-time transfer, each class is annotated with semantic information, commonly in the form of attributes or text descriptions. While classical zero-shot learning does not explicitly forbid using information from other datasets, the approaches that achieve the best absolute performance on image benchmarks rely on features extracted from encoders pretrained on Imagenet. This approach relies on hyper-optimized Imagenet-relevant parameters from the supervised classification setting, entangling important questions about the suitability of those parameters and how they were learned with more fundamental questions about representation learning and generalization. To remove these distractors, we propose a more challenging setting: Zero-Shot Learning from scratch (ZFS), which explicitly forbids the use of encoders fine-tuned on other datasets. Our analysis on this setting highlights the importance of local information, and compositional representations.

Related papers

From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection [38.98491521357191]
textbfABS achieves state-of-the-art performance on out-of-distribution generalization and zero-shot classification tasks.<n>textbfABS is training-free and even rivals few-shot and test-time adaptation methods.
arXiv Detail & Related papers (2025-05-19T15:15:37Z)
Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training. Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes. We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z)
Data-Free Generalized Zero-Shot Learning [45.86614536578522]
We propose a generic framework for data-free zero-shot learning (DFZSL) Our framework has been evaluated on five commonly used benchmarks for generalized ZSL, as well as 11 benchmarks for the base-to-new ZSL.
arXiv Detail & Related papers (2024-01-28T13:26:47Z)
Text Descriptions are Compressive and Invariant Representations for Visual Learning [63.3464863723631]
We show that an alternative approach, in line with humans' understanding of multiple visual features per class, can provide compelling performance in the robust few-shot learning setting. In particular, we introduce a novel method, textit SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors). This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify
arXiv Detail & Related papers (2023-07-10T03:06:45Z)
Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z)
Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations. In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z)
Region Semantically Aligned Network for Zero-Shot Learning [18.18665627472823]
We propose a Region Semantically Aligned Network (RSAN) which maps local features of unseen classes to their semantic attributes. We obtain each attribute from a specific region of the output and exploit these attributes for recognition. Experiments on several standard ZSL datasets reveal the benefit of the proposed RSAN method, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2021-10-14T03:23:40Z)
Rectifying the Shortcut Learning of Background: Shared Object Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks. We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions [5.3556221126231085]
We study the impact of using rich and diverse textual descriptions of classes for zero-shot learning (ZSL) on ImageNet. We create a new dataset ImageNet-Wiki that matches each ImageNet class to its corresponding Wikipedia article. We show that employing these Wikipedia articles as class descriptions yields much higher ZSL performance than prior works.
arXiv Detail & Related papers (2021-03-17T14:06:56Z)
Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features. Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z)
Weakly-supervised Object Localization for Few-shot Learning and Fine-grained Few-shot Learning [0.5156484100374058]
Few-shot learning aims to learn novel visual categories from very few samples. We propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization. We also produce the activated masks for selecting discriminative deep descriptors for few-shot classification.
arXiv Detail & Related papers (2020-03-02T14:07:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.