MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
- URL: http://arxiv.org/abs/2203.03137v1
- Date: Mon, 7 Mar 2022 05:27:08 GMT
- Title: MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning
- Authors: Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Wang, Qinmu Peng, Kai
Wang, Jian Zhao, Xinge You
- Abstract summary: Key challenge of zero-shot learning (ZSL) is how to infer the latent semantic knowledge between visual and attribute features on seen classes.
We propose a Mutually Semantic Distillation Network (MSDN), which progressively distills the intrinsic semantic representations between visual and attribute features.
- Score: 28.330268557106912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key challenge of zero-shot learning (ZSL) is how to infer the latent
semantic knowledge between visual and attribute features on seen classes, and
thus achieving a desirable knowledge transfer to unseen classes. Prior works
either simply align the global features of an image with its associated class
semantic vector or utilize unidirectional attention to learn the limited latent
semantic representations, which could not effectively discover the intrinsic
semantic knowledge e.g., attribute semantics) between visual and attribute
features. To solve the above dilemma, we propose a Mutually Semantic
Distillation Network (MSDN), which progressively distills the intrinsic
semantic representations between visual and attribute features for ZSL. MSDN
incorporates an attribute$\rightarrow$visual attention sub-net that learns
attribute-based visual features, and a visual$\rightarrow$attribute attention
sub-net that learns visual-based attribute features. By further introducing a
semantic distillation loss, the two mutual attention sub-nets are capable of
learning collaboratively and teaching each other throughout the training
process. The proposed MSDN yields significant improvements over the strong
baselines, leading to new state-of-the-art performances on three popular
challenging benchmarks, i.e., CUB, SUN, and AWA2. Our codes have been available
at: \url{https://github.com/shiming-chen/MSDN}.
Related papers
- Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [56.65891462413187]
We propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT)
ZSLViT first introduces semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement.
Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement.
arXiv Detail & Related papers (2024-04-11T12:59:38Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Exploiting Semantic Attributes for Transductive Zero-Shot Learning [97.61371730534258]
Zero-shot learning aims to recognize unseen classes by generalizing the relation between visual features and semantic attributes learned from the seen classes.
We present a novel transductive ZSL method that produces semantic attributes of the unseen data and imposes them on the generative process.
Experiments on five standard benchmarks show that our method yields state-of-the-art results for zero-shot learning.
arXiv Detail & Related papers (2023-03-17T09:09:48Z) - TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning [119.43299939907685]
Zero-shot learning (ZSL) tackles the novel class recognition problem by transferring semantic knowledge from seen classes to unseen ones.
Existing attention-based models have struggled to learn inferior region features in a single image by solely using unidirectional attention.
We propose a cross attribute-guided Transformer network, termed TransZero++, to refine visual features and learn accurate attribute localization for semantic-augmented visual embedding representations.
arXiv Detail & Related papers (2021-12-16T05:49:51Z) - TransZero: Attribute-guided Transformer for Zero-Shot Learning [25.55614833575993]
Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones.
We propose an attribute-guided Transformer network, TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations.
arXiv Detail & Related papers (2021-12-03T02:39:59Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - Isometric Propagation Network for Generalized Zero-shot Learning [72.02404519815663]
A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data.
We propose Isometric propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces.
IPN achieves state-of-the-art performance on three popular Zero-shot learning benchmarks.
arXiv Detail & Related papers (2021-02-03T12:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.