Learning Aligned Cross-Modal Representation for Generalized Zero-Shot
Classification
- URL: http://arxiv.org/abs/2112.12927v1
- Date: Fri, 24 Dec 2021 03:35:37 GMT
- Title: Learning Aligned Cross-Modal Representation for Generalized Zero-Shot
Classification
- Authors: Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng
Yin
- Abstract summary: We propose an innovative autoencoder network by learning Aligned Cross-Modal Representations (dubbed ACMR) for Generalized Zero-Shot Classification (GZSC)
Specifically, we propose a novel Vision-Semantic Alignment (VSA) method to strengthen the alignment of cross-modal latent features on the latent subspaces guided by a learned classifier.
In addition, we propose a novel Information Enhancement Module (IEM) to reduce the possibility of latent variables collapse meanwhile encouraging the discriminative ability of latent variables.
- Score: 17.177622259867515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning a common latent embedding by aligning the latent spaces of
cross-modal autoencoders is an effective strategy for Generalized Zero-Shot
Classification (GZSC). However, due to the lack of fine-grained instance-wise
annotations, it still easily suffer from the domain shift problem for the
discrepancy between the visual representation of diversified images and the
semantic representation of fixed attributes. In this paper, we propose an
innovative autoencoder network by learning Aligned Cross-Modal Representations
(dubbed ACMR) for GZSC. Specifically, we propose a novel Vision-Semantic
Alignment (VSA) method to strengthen the alignment of cross-modal latent
features on the latent subspaces guided by a learned classifier. In addition,
we propose a novel Information Enhancement Module (IEM) to reduce the
possibility of latent variables collapse meanwhile encouraging the
discriminative ability of latent variables. Extensive experiments on publicly
available datasets demonstrate the state-of-the-art performance of our method.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning [48.46511584490582]
Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories.
Real-world challenges such as distribution imbalances and attribute co-occurrence hinder the discernment of local variances in images.
We propose a bidirectional cross-modal ZSL approach CREST to overcome these challenges.
arXiv Detail & Related papers (2024-04-15T10:19:39Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - Weakly supervised segmentation with cross-modality equivariant
constraints [7.757293476741071]
Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation.
We present a novel learning strategy that leverages self-supervision in a multi-modal image scenario to significantly enhance original CAMs.
Our approach outperforms relevant recent literature under the same learning conditions.
arXiv Detail & Related papers (2021-04-06T13:14:20Z) - Margin Preserving Self-paced Contrastive Learning Towards Domain
Adaptation for Medical Image Segmentation [51.93711960601973]
We propose a novel margin preserving self-paced contrastive Learning model for cross-modal medical image segmentation.
With the guidance of progressively refined semantic prototypes, a novel margin preserving contrastive loss is proposed to boost the discriminability of embedded representation space.
Experiments on cross-modal cardiac segmentation tasks demonstrate that MPSCL significantly improves semantic segmentation performance.
arXiv Detail & Related papers (2021-03-15T15:23:10Z) - Cross Knowledge-based Generative Zero-Shot Learning Approach with
Taxonomy Regularization [5.280368849852332]
We develop a generative network-based ZSL approach equipped with the proposed Cross Knowledge Learning (CKL) scheme and Taxonomy Regularization (TR)
CKL enables more relevant semantic features to be trained for semantic-to-visual feature embedding in ZSL.
TR significantly improves the intersections with unseen images with more generalized visual features generated from generative network.
arXiv Detail & Related papers (2021-01-25T04:38:18Z) - Unsupervised Domain Adaptation in Semantic Segmentation via Orthogonal
and Clustered Embeddings [25.137859989323537]
We propose an effective Unsupervised Domain Adaptation (UDA) strategy, based on a feature clustering method.
We introduce two novel learning objectives to enhance the discriminative clustering performance.
arXiv Detail & Related papers (2020-11-25T10:06:22Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.