An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning
- URL: http://arxiv.org/abs/2306.14628v2
- Date: Fri, 11 Aug 2023 10:17:04 GMT
- Title: An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning
- Authors: William Heyden, Habib Ullah, M. Salman Siddiqui, Fadi Al Machot
- Abstract summary: Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes)
Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature space into the semantic space and the decoder reconstructs the original visual feature space.
We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space vectord with the semantic space into a latent representation space.
- Score: 0.46644955105516456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot Learning (ZSL) classification categorizes or predicts classes
(labels) that are not included in the training set (unseen classes). Recent
works proposed different semantic autoencoder (SAE) models where the encoder
embeds a visual feature vector space into the semantic space and the decoder
reconstructs the original visual feature space. The objective is to learn the
embedding by leveraging a source data distribution, which can be applied
effectively to a different but related target data distribution. Such
embedding-based methods are prone to domain shift problems and are vulnerable
to biases. We propose an integral projection-based semantic autoencoder
(IP-SAE) where an encoder projects a visual feature space concatenated with the
semantic space into a latent representation space. We force the decoder to
reconstruct the visual-semantic data space. Due to this constraint, the
visual-semantic projection function preserves the discriminatory data included
inside the original visual feature space. The enriched projection forces a more
precise reconstitution of the visual feature space invariant to the domain
manifold. Consequently, the learned projection function is less domain-specific
and alleviates the domain shift problem. Our proposed IP-SAE model consolidates
a symmetric transformation function for embedding and projection, and thus, it
provides transparency for interpreting generative applications in ZSL.
Therefore, in addition to outperforming state-of-the-art methods considering
four benchmark datasets, our analytical approach allows us to investigate
distinct characteristics of generative-based methods in the unique context of
zero-shot inference.
Related papers
- SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized
Zero-Shot Learning [0.7420433640907689]
Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes.
This paper introduces a dual strategy to address the generalization gap.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - Masked Momentum Contrastive Learning for Zero-shot Semantic
Understanding [39.424931953675994]
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data.
This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks.
arXiv Detail & Related papers (2023-08-22T13:55:57Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - Zero-Shot Learning from Adversarial Feature Residual to Compact Visual
Feature [26.89763840782029]
We propose a novel adversarial network to synthesize compact semantic visual features for zero-shot learning (ZSL)
The residual generator is to generate the visual feature residual, which is integrated with a visual prototype predicted via the prototype predictor.
The discriminator is to distinguish the synthetic visual features from the real ones extracted from an existing categorization CNN.
arXiv Detail & Related papers (2020-08-29T11:16:11Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Generative Model-driven Structure Aligning Discriminative Embeddings for
Transductive Zero-shot Learning [21.181715602603436]
We propose a neural network-based model for learning a projection function which aligns the visual and semantic data in the latent space.
We show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and.
We also show the efficacy of our model in the case of extremely less labelled data regime.
arXiv Detail & Related papers (2020-05-09T18:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.