A Novel Perspective to Zero-shot Learning: Towards an Alignment of
Manifold Structures via Semantic Feature Expansion
- URL: http://arxiv.org/abs/2004.14795v1
- Date: Thu, 30 Apr 2020 14:08:10 GMT
- Title: A Novel Perspective to Zero-shot Learning: Towards an Alignment of
Manifold Structures via Semantic Feature Expansion
- Authors: Jingcai Guo, Song Guo
- Abstract summary: A common practice in zero-shot learning is to train a projection between the visual and semantic feature spaces with labeled seen classes examples.
Under such a paradigm, most existing methods easily suffer from the domain shift problem and weaken the performance of zero-shot recognition.
We propose a novel model called AMS-SFE that considers the alignment of manifold structures by semantic feature expansion.
- Score: 17.48923061278128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot learning aims at recognizing unseen classes (no training example)
with knowledge transferred from seen classes. This is typically achieved by
exploiting a semantic feature space shared by both seen and unseen classes,
i.e., attribute or word vector, as the bridge. One common practice in zero-shot
learning is to train a projection between the visual and semantic feature
spaces with labeled seen classes examples. When inferring, this learned
projection is applied to unseen classes and recognizes the class labels by some
metrics. However, the visual and semantic feature spaces are mutually
independent and have quite different manifold structures. Under such a
paradigm, most existing methods easily suffer from the domain shift problem and
weaken the performance of zero-shot recognition. To address this issue, we
propose a novel model called AMS-SFE. It considers the alignment of manifold
structures by semantic feature expansion. Specifically, we build upon an
autoencoder-based model to expand the semantic features from the visual inputs.
Additionally, the expansion is jointly guided by an embedded manifold extracted
from the visual feature space of the data. Our model is the first attempt to
align both feature spaces by expanding semantic features and derives two
benefits: first, we expand some auxiliary features that enhance the semantic
feature space; second and more importantly, we implicitly align the manifold
structures between the visual and semantic feature spaces; thus, the projection
can be better trained and mitigate the domain shift problem. Extensive
experiments show significant performance improvement, which verifies the
effectiveness of our model.
Related papers
- Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning [23.96220607033524]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
It is trained to recognize multiple unseen classes within a sample based on seen classes and auxiliary knowledge.
We propose a novel and comprehensive visual-semantic framework for MLZSL, dubbed Epsilon, to fully make use of such properties.
arXiv Detail & Related papers (2024-08-22T09:45:24Z) - Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Beyond Prototypes: Semantic Anchor Regularization for Better
Representation Learning [82.29761875805369]
One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes.
We propose a novel perspective to use pre-defined class anchors serving as feature centroid to unidirectionally guide feature learning.
The proposed Semantic Anchor Regularization (SAR) can be used in a plug-and-play manner in the existing models.
arXiv Detail & Related papers (2023-12-19T05:52:38Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Primitive Generation and Semantic-related Alignment for Universal
Zero-Shot Segmentation [13.001629605405954]
We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples.
We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces.
The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2023-06-19T17:59:16Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Rich Semantics Improve Few-shot Learning [49.11659525563236]
We show that by using 'class-level' language descriptions, that can be acquired with minimal annotation cost, we can improve the few-shot learning performance.
We develop a Transformer based forward and backward encoding mechanism to relate visual and semantic tokens.
arXiv Detail & Related papers (2021-04-26T16:48:27Z) - Learning Robust Visual-semantic Mapping for Zero-shot Learning [8.299945169799795]
We focus on fully empowering the semantic feature space, which is one of the key building blocks of Zero-shot learning (ZSL)
In ZSL, the common practice is to train a mapping function between the visual and semantic feature spaces with labeled seen class examples.
Under such a paradigm, the ZSL models may easily suffer from the domain shift problem when constructing and reusing the mapping function.
arXiv Detail & Related papers (2021-04-12T17:39:38Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z) - Generative Model-driven Structure Aligning Discriminative Embeddings for
Transductive Zero-shot Learning [21.181715602603436]
We propose a neural network-based model for learning a projection function which aligns the visual and semantic data in the latent space.
We show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and.
We also show the efficacy of our model in the case of extremely less labelled data regime.
arXiv Detail & Related papers (2020-05-09T18:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.