Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning
- URL: http://arxiv.org/abs/2303.15322v1
- Date: Mon, 27 Mar 2023 15:21:43 GMT
- Title: Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning
- Authors: Man Liu, Feng Li, Chunjie Zhang, Yunchao Wei, Huihui Bai, Yao Zhao
- Abstract summary: Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain.
We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features.
DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
- Score: 74.48337375174297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalized Zero-Shot Learning (GZSL) identifies unseen categories by
knowledge transferred from the seen domain, relying on the intrinsic
interactions between visual and semantic information. Prior works mainly
localize regions corresponding to the sharing attributes. When various visual
appearances correspond to the same attribute, the sharing attributes inevitably
introduce semantic ambiguity, hampering the exploration of accurate
semantic-visual interactions. In this paper, we deploy the dual semantic-visual
transformer module (DSVTM) to progressively model the correspondences between
attribute prototypes and visual features, constituting a progressive
semantic-visual mutual adaption (PSVMA) network for semantic disambiguation and
knowledge transferability improvement. Specifically, DSVTM devises an
instance-motivated semantic encoder that learns instance-centric prototypes to
adapt to different images, enabling the recast of the unmatched semantic-visual
pair into the matched one. Then, a semantic-motivated instance decoder
strengthens accurate cross-domain interactions between the matched pair for
semantic-related instance adaption, encouraging the generation of unambiguous
visual representations. Moreover, to mitigate the bias towards seen classes in
GZSL, a debiasing loss is proposed to pursue response consistency between seen
and unseen predictions. The PSVMA consistently yields superior performances
against other state-of-the-art methods. Code will be available at:
https://github.com/ManLiuCoder/PSVMA.
Related papers
- PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning [116.33775552866476]
Generalized zero-shot learning (GZSL) endeavors to identify the unseen using knowledge from the seen domain.
GZSL suffers from insufficient visual-semantic correspondences due to attribute diversity and instance diversity.
We propose a multi-granularity progressive semantic-visual adaption network, where sufficient visual elements can be gathered to remedy the inconsistency.
arXiv Detail & Related papers (2024-10-15T12:49:33Z) - Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [56.65891462413187]
We propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT)
ZSLViT first introduces semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement.
Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement.
arXiv Detail & Related papers (2024-04-11T12:59:38Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT)
We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature.
While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.