Semantic-diversity transfer network for generalized zero-shot learning
via inner disagreement based OOD detector
- URL: http://arxiv.org/abs/2203.09017v1
- Date: Thu, 17 Mar 2022 01:31:27 GMT
- Title: Semantic-diversity transfer network for generalized zero-shot learning
via inner disagreement based OOD detector
- Authors: Bo Liu, Qiulei Dong, Zhanyi Hu
- Abstract summary: Zero-shot learning (ZSL) aims to recognize objects from unseen classes, where the kernel problem is to transfer knowledge from seen classes to unseen classes.
The knowledge transfer in many existing works is limited mainly due to the facts that 1) the widely used visual features are global ones but not totally consistent with semantic attributes.
We propose a Semantic-diversity transfer Network (SetNet) addressing the first two limitations, where 1) a multiple-attention architecture and a diversity regularizer are proposed to learn multiple local visual features that are more consistent with semantic attributes and 2) a projector ensemble that geometrically takes diverse local features as inputs
- Score: 26.89763840782029
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Zero-shot learning (ZSL) aims to recognize objects from unseen classes, where
the kernel problem is to transfer knowledge from seen classes to unseen classes
by establishing appropriate mappings between visual and semantic features. The
knowledge transfer in many existing works is limited mainly due to the facts
that 1) the widely used visual features are global ones but not totally
consistent with semantic attributes; 2) only one mapping is learned in existing
works, which is not able to effectively model diverse visual-semantic
relations; 3) the bias problem in the generalized ZSL (GZSL) could not be
effectively handled. In this paper, we propose two techniques to alleviate
these limitations. Firstly, we propose a Semantic-diversity transfer Network
(SetNet) addressing the first two limitations, where 1) a multiple-attention
architecture and a diversity regularizer are proposed to learn multiple local
visual features that are more consistent with semantic attributes and 2) a
projector ensemble that geometrically takes diverse local features as inputs is
proposed to model visual-semantic relations from diverse local perspectives.
Secondly, we propose an inner disagreement based domain detection module (ID3M)
for GZSL to alleviate the third limitation, which picks out unseen-class data
before class-level classification. Due to the absence of unseen-class data in
training stage, ID3M employs a novel self-contained training scheme and detects
out unseen-class data based on a designed inner disagreement criterion.
Experimental results on three public datasets demonstrate that the proposed
SetNet with the explored ID3M achieves a significant improvement against $30$
state-of-the-art methods.
Related papers
- Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot
Learning [24.075034737719776]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
We propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection.
Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
arXiv Detail & Related papers (2023-09-02T12:07:21Z) - A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning [19.90385022248391]
Task-aware Dual Similarity Network( TDSNet) proposed to explore global invariant features and discriminative local details.
TDSNet achieves competitive performance by comparing with other state-of-the-art algorithms.
arXiv Detail & Related papers (2022-10-22T04:24:55Z) - Federated Zero-Shot Learning for Visual Recognition [55.65879596326147]
We propose a novel Federated Zero-Shot Learning FedZSL framework.
FedZSL learns a central model from the decentralized data residing on edge devices.
The effectiveness and robustness of FedZSL are demonstrated by extensive experiments conducted on three zero-shot benchmark datasets.
arXiv Detail & Related papers (2022-09-05T14:49:34Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Zero-Shot Learning Based on Knowledge Sharing [0.0]
Zero-Shot Learning (ZSL) is an emerging research that aims to solve the classification problems with very few training data.
This paper introduces knowledge sharing (KS) to enrich the representation of semantic features.
Based on KS, we apply a generative adversarial network to generate pseudo visual features from semantic features that are very close to the real visual features.
arXiv Detail & Related papers (2021-02-26T06:43:29Z) - Isometric Propagation Network for Generalized Zero-shot Learning [72.02404519815663]
A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data.
We propose Isometric propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces.
IPN achieves state-of-the-art performance on three popular Zero-shot learning benchmarks.
arXiv Detail & Related papers (2021-02-03T12:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.