TransZero: Attribute-guided Transformer for Zero-Shot Learning
- URL: http://arxiv.org/abs/2112.01683v1
- Date: Fri, 3 Dec 2021 02:39:59 GMT
- Title: TransZero: Attribute-guided Transformer for Zero-Shot Learning
- Authors: Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li,
Qinmu Peng, Ke Lu, Xinge You
- Abstract summary: Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones.
We propose an attribute-guided Transformer network, TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations.
- Score: 25.55614833575993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot learning (ZSL) aims to recognize novel classes by transferring
semantic knowledge from seen classes to unseen ones. Semantic knowledge is
learned from attribute descriptions shared between different classes, which act
as strong priors for localizing object attributes that represent discriminative
region features, enabling significant visual-semantic interaction. Although
some attention-based models have attempted to learn such region features in a
single image, the transferability and discriminative attribute localization of
visual features are typically neglected. In this paper, we propose an
attribute-guided Transformer network, termed TransZero, to refine visual
features and learn attribute localization for discriminative visual embedding
representations in ZSL. Specifically, TransZero takes a feature augmentation
encoder to alleviate the cross-dataset bias between ImageNet and ZSL
benchmarks, and improves the transferability of visual features by reducing the
entangled relative geometry relationships among region features. To learn
locality-augmented visual features, TransZero employs a visual-semantic decoder
to localize the image regions most relevant to each attribute in a given image,
under the guidance of semantic attribute information. Then, the
locality-augmented visual features and semantic vectors are used to conduct
effective visual-semantic interaction in a visual-semantic embedding network.
Extensive experiments show that TransZero achieves the new state of the art on
three ZSL benchmarks. The codes are available at:
\url{https://github.com/shiming-chen/TransZero}.
Related papers
- Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [56.65891462413187]
We propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT)
ZSLViT first introduces semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement.
Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement.
arXiv Detail & Related papers (2024-04-11T12:59:38Z) - High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning [54.86882315023791]
We propose an innovative approach called High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning (HDAFL)
HDAFL utilizes multiple convolutional kernels to automatically learn discriminative regions highly correlated with attributes in images.
We also introduce a Transformer-based attribute discrimination encoder to enhance the discriminative capability among attributes.
arXiv Detail & Related papers (2024-04-07T13:17:47Z) - Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene
Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training.
Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes.
We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning [119.43299939907685]
Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones.
Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention.
We propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for semantic-augmented visual embedding representations.
arXiv Detail & Related papers (2021-12-16T05:49:51Z) - Region Semantically Aligned Network for Zero-Shot Learning [18.18665627472823]
We propose a Region Semantically Aligned Network (RSAN) which maps local features of unseen classes to their semantic attributes.
We obtain each attribute from a specific region of the output and exploit these attributes for recognition.
Experiments on several standard ZSL datasets reveal the benefit of the proposed RSAN method, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2021-10-14T03:23:40Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.