Semantic Disentangling Generalized Zero-Shot Learning
- URL: http://arxiv.org/abs/2101.07978v2
- Date: Wed, 27 Jan 2021 02:28:44 GMT
- Title: Semantic Disentangling Generalized Zero-Shot Learning
- Authors: Zhi Chen, Ruihong Qiu, Sen Wang, Zi Huang, Jingjing Li, Zheng Zhang
- Abstract summary: Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
- Score: 50.259058462272435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen
and unseen categories. Most GZSL methods typically learn to synthesize CNN
visual features for the unseen classes by leveraging entire semantic
information, e.g., tags and attributes, and the visual features of the seen
classes. Within the visual features, we define two types of features that
semantic-consistent and semantic-unrelated to represent the characteristics of
images annotated in attributes and less informative features of images
respectively. Ideally, the semantic-unrelated information is impossible to
transfer by semantic-visual relationship from seen classes to unseen classes,
as the corresponding characteristics are not annotated in the semantic
information. Thus, the foundation of the visual feature synthesis is not always
solid as the features of the seen classes may involve semantic-unrelated
information that could interfere with the alignment between semantic and visual
modalities. To address this issue, in this paper, we propose a novel feature
disentangling approach based on an encoder-decoder architecture to factorize
visual features of images into these two latent feature spaces to extract
corresponding representations. Furthermore, a relation module is incorporated
into this architecture to learn semantic-visual relationship, whilst a total
correlation penalty is applied to encourage the disentanglement of two latent
representations. The proposed model aims to distill quality semantic-consistent
representations that capture intrinsic features of seen images, which are
further taken as the generation target for unseen classes. Extensive
experiments conducted on seven GZSL benchmark datasets have verified the
state-of-the-art performance of the proposal.
Related papers
- Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [56.65891462413187]
We propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT)
ZSLViT first introduces semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement.
Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement.
arXiv Detail & Related papers (2024-04-11T12:59:38Z) - Primitive Generation and Semantic-related Alignment for Universal
Zero-Shot Segmentation [13.001629605405954]
We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples.
We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces.
The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2023-06-19T17:59:16Z) - Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning [74.48337375174297]
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain.
We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features.
DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
arXiv Detail & Related papers (2023-03-27T15:21:43Z) - Learning Invariant Visual Representations for Compositional Zero-Shot
Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set.
We propose an invariant feature learning framework to align different domains at the representation and gradient levels.
Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Semantic Feature Extraction for Generalized Zero-shot Learning [23.53412767106488]
Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the attribute.
In this paper, we put forth a new GZSL technique that improves the GZSL classification performance greatly.
arXiv Detail & Related papers (2021-12-29T09:52:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.