SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized
Zero-Shot Learning
- URL: http://arxiv.org/abs/2312.13100v1
- Date: Wed, 20 Dec 2023 15:18:51 GMT
- Title: SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized
Zero-Shot Learning
- Authors: William Heyden, Habib Ullah, M. Salman Siddiqui, Fadi Al Machot
- Abstract summary: Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes.
This paper introduces a dual strategy to address the generalization gap.
- Score: 0.7420433640907689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by
transferring knowledge from the seen classes, depending on the inherent
interactions between visual and semantic data. However, the discrepancy between
well-prepared training data and unpredictable real-world test scenarios remains
a significant challenge. This paper introduces a dual strategy to address the
generalization gap. Firstly, we incorporate semantic information through an
innovative encoder. This encoder effectively integrates class-specific semantic
information by targeting the performance disparity, enhancing the produced
features to enrich the semantic space for class-specific attributes. Secondly,
we refine our generative capabilities using a novel compositional loss
function. This approach generates discriminative classes, effectively
classifying both seen and unseen classes. In addition, we extend the
exploitation of the learned latent space by utilizing controlled semantic
inputs, ensuring the robustness of the model in varying environments. This
approach yields a model that outperforms the state-of-the-art models in terms
of both generalization and diverse settings, notably without requiring
hyperparameter tuning or domain-specific adaptations. We also propose a set of
novel evaluation metrics to provide a more detailed assessment of the
reliability and reproducibility of the results. The complete code is made
available on https://github.com/william-heyden/SEER-ZeroShotLearning/.
Related papers
- Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning [23.96220607033524]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
It is trained to recognize multiple unseen classes within a sample based on seen classes and auxiliary knowledge.
We propose a novel and comprehensive visual-semantic framework for MLZSL, dubbed Epsilon, to fully make use of such properties.
arXiv Detail & Related papers (2024-08-22T09:45:24Z) - Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning [70.64617500380287]
Continual learning allows models to learn from new data while retaining previously learned knowledge.
The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes.
We propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings.
arXiv Detail & Related papers (2024-08-02T07:51:44Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Zero-Shot Learning Based on Knowledge Sharing [0.0]
Zero-Shot Learning (ZSL) is an emerging research that aims to solve the classification problems with very few training data.
This paper introduces knowledge sharing (KS) to enrich the representation of semantic features.
Based on KS, we apply a generative adversarial network to generate pseudo visual features from semantic features that are very close to the real visual features.
arXiv Detail & Related papers (2021-02-26T06:43:29Z) - Information Bottleneck Constrained Latent Bidirectional Embedding for
Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint.
We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces.
Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z) - Generative Model-driven Structure Aligning Discriminative Embeddings for
Transductive Zero-shot Learning [21.181715602603436]
We propose a neural network-based model for learning a projection function which aligns the visual and semantic data in the latent space.
We show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and.
We also show the efficacy of our model in the case of extremely less labelled data regime.
arXiv Detail & Related papers (2020-05-09T18:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.