ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning
- URL: http://arxiv.org/abs/2406.02929v2
- Date: Tue, 11 Feb 2025 08:09:50 GMT
- Title: ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning
- Authors: Zihan Ye, Shreyank N. Gowda, Xiaowei Huang, Haotian Xu, Yaochu Jin, Kaizhu Huang, Xiaobo Jin,
- Abstract summary: A scarcity of seen class samples results in a marked decrease in performance across many generative Zero-shot Learning techniques.
We introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations.
ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data.
- Score: 38.36200871549062
- License:
- Abstract: Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff_ICLR25.
Related papers
- CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning [48.46511584490582]
Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories.
Real-world challenges such as distribution imbalances and attribute co-occurrence hinder the discernment of local variances in images.
We propose a bidirectional cross-modal ZSL approach CREST to overcome these challenges.
arXiv Detail & Related papers (2024-04-15T10:19:39Z) - Zero-Shot Learning by Harnessing Adversarial Samples [52.09717785644816]
We propose a novel Zero-Shot Learning (ZSL) approach by Harnessing Adversarial Samples (HAS)
HAS advances ZSL through adversarial training which takes into account three crucial aspects.
We demonstrate the effectiveness of our adversarial samples approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios.
arXiv Detail & Related papers (2023-08-01T06:19:13Z) - Resolving Semantic Confusions for Improved Zero-Shot Detection [6.72910827751713]
We propose a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes.
A cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics.
arXiv Detail & Related papers (2022-12-12T18:11:48Z) - Federated Zero-Shot Learning for Visual Recognition [55.65879596326147]
We propose a novel Federated Zero-Shot Learning FedZSL framework.
FedZSL learns a central model from the decentralized data residing on edge devices.
The effectiveness and robustness of FedZSL are demonstrated by extensive experiments conducted on three zero-shot benchmark datasets.
arXiv Detail & Related papers (2022-09-05T14:49:34Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Zero-Shot Logit Adjustment [89.68803484284408]
Generalized Zero-Shot Learning (GZSL) is a semantic-descriptor-based learning technique.
In this paper, we propose a new generation-based technique to enhance the generator's effect while neglecting the improvement of the classifier.
Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks.
arXiv Detail & Related papers (2022-04-25T17:54:55Z) - Disentangling Semantic-to-visual Confusion for Zero-shot Learning [13.610995960100869]
We develop a novel model called Disentangling Class Representation Generative Adrial Network (DCR-GAN)
Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features.
Our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
arXiv Detail & Related papers (2021-06-16T08:04:11Z) - Information Bottleneck Constrained Latent Bidirectional Embedding for
Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint.
We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces.
Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.