Isometric Propagation Network for Generalized Zero-shot Learning
- URL: http://arxiv.org/abs/2102.02038v1
- Date: Wed, 3 Feb 2021 12:45:38 GMT
- Title: Isometric Propagation Network for Generalized Zero-shot Learning
- Authors: Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Xuanyi Dong, Chengqi
Zhang
- Abstract summary: A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data.
We propose Isometric propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces.
IPN achieves state-of-the-art performance on three popular Zero-shot learning benchmarks.
- Score: 72.02404519815663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot learning (ZSL) aims to classify images of an unseen class only
based on a few attributes describing that class but no access to any training
sample. A popular strategy is to learn a mapping between the semantic space of
class attributes and the visual space of images based on the seen classes and
their data. Thus, an unseen class image can be ideally mapped to its
corresponding class attributes. The key challenge is how to align the
representations in the two spaces. For most ZSL settings, the attributes for
each seen/unseen class are only represented by a vector while the seen-class
data provide much more information. Thus, the imbalanced supervision from the
semantic and the visual space can make the learned mapping easily overfitting
to the seen classes. To resolve this problem, we propose Isometric Propagation
Network (IPN), which learns to strengthen the relation between classes within
each space and align the class dependency in the two spaces. Specifically, IPN
learns to propagate the class representations on an auto-generated graph within
each space. In contrast to only aligning the resulted static representation, we
regularize the two dynamic propagation procedures to be isometric in terms of
the two graphs' edge weights per step by minimizing a consistency loss between
them. IPN achieves state-of-the-art performance on three popular ZSL
benchmarks. To evaluate the generalization capability of IPN, we further build
two larger benchmarks with more diverse unseen classes and demonstrate the
advantages of IPN on them.
Related papers
- Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene
Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training.
Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes.
We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Text Descriptions are Compressive and Invariant Representations for
Visual Learning [63.3464863723631]
We show that an alternative approach, in line with humans' understanding of multiple visual features per class, can provide compelling performance in the robust few-shot learning setting.
In particular, we introduce a novel method, textit SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors).
This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify
arXiv Detail & Related papers (2023-07-10T03:06:45Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - A Simple Approach for Zero-Shot Learning based on Triplet Distribution
Embeddings [6.193231258199234]
ZSL aims to recognize unseen classes without labeled training data by exploiting semantic information.
Existing ZSL methods mainly use vectors to represent the embeddings to the semantic space.
We address this issue by leveraging the use of distribution embeddings.
arXiv Detail & Related papers (2021-03-29T20:26:20Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - Attribute Propagation Network for Graph Zero-shot Learning [57.68486382473194]
We introduce the attribute propagation network (APNet), which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier.
APNet achieves either compelling performance or new state-of-the-art results in experiments with two zero-shot learning settings and five benchmark datasets.
arXiv Detail & Related papers (2020-09-24T16:53:40Z) - Information Bottleneck Constrained Latent Bidirectional Embedding for
Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint.
We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces.
Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.