Related papers: ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning

ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning

URL: http://arxiv.org/abs/2406.02929v2
Date: Tue, 11 Feb 2025 08:09:50 GMT
Title: ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning
Authors: Zihan Ye, Shreyank N. Gowda, Xiaowei Huang, Haotian Xu, Yaochu Jin, Kaizhu Huang, Xiaobo Jin,
Abstract summary: A scarcity of seen class samples results in a marked decrease in performance across many generative Zero-shot Learning techniques.<n>We introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations.<n>ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data.
Score: 38.36200871549062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff_ICLR25.

Related papers

CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning [48.46511584490582]
Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. Real-world challenges such as distribution imbalances and attribute co-occurrence hinder the discernment of local variances in images. We propose a bidirectional cross-modal ZSL approach CREST to overcome these challenges.
arXiv Detail & Related papers (2024-04-15T10:19:39Z)
Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions [11.121652649243119]
Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation. We propose a novel approach termed the detail reinforcement diffusion model(DRDM) It leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference(SKR)
arXiv Detail & Related papers (2023-09-15T01:28:59Z)
Zero-Shot Learning by Harnessing Adversarial Samples [52.09717785644816]
We propose a novel Zero-Shot Learning (ZSL) approach by Harnessing Adversarial Samples (HAS) HAS advances ZSL through adversarial training which takes into account three crucial aspects. We demonstrate the effectiveness of our adversarial samples approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios.
arXiv Detail & Related papers (2023-08-01T06:19:13Z)
DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques. We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process. Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z)
Resolving Semantic Confusions for Improved Zero-Shot Detection [6.72910827751713]
We propose a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes. A cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics.
arXiv Detail & Related papers (2022-12-12T18:11:48Z)
Federated Zero-Shot Learning for Visual Recognition [55.65879596326147]
We propose a novel Federated Zero-Shot Learning FedZSL framework. FedZSL learns a central model from the decentralized data residing on edge devices. The effectiveness and robustness of FedZSL are demonstrated by extensive experiments conducted on three zero-shot benchmark datasets.
arXiv Detail & Related papers (2022-09-05T14:49:34Z)
FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs [24.18718734850797]
Data-Efficient GANs (DE-GANs) aim to learn generative models with a limited amount of training data. Contrastive learning has shown the great potential of increasing the synthesis quality of DE-GANs. We propose FakeCLR, which only applies contrastive learning on fake samples.
arXiv Detail & Related papers (2022-07-18T14:23:38Z)
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
Augmentation-Aware Self-Supervision for Data-Efficient GAN Training [68.81471633374393]
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting. We propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data. We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures.
arXiv Detail & Related papers (2022-05-31T10:35:55Z)
Zero-Shot Logit Adjustment [89.68803484284408]
Generalized Zero-Shot Learning (GZSL) is a semantic-descriptor-based learning technique. In this paper, we propose a new generation-based technique to enhance the generator's effect while neglecting the improvement of the classifier. Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks.
arXiv Detail & Related papers (2022-04-25T17:54:55Z)
Disentangling Semantic-to-visual Confusion for Zero-shot Learning [13.610995960100869]
We develop a novel model called Disentangling Class Representation Generative Adrial Network (DCR-GAN) Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
arXiv Detail & Related papers (2021-06-16T08:04:11Z)
Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z)
Generalized Zero-Shot Learning via VAE-Conditioned Generative Flow [83.27681781274406]
Generalized zero-shot learning aims to recognize both seen and unseen classes by transferring knowledge from semantic descriptions to visual representations. Recent generative methods formulate GZSL as a missing data problem, which mainly adopts GANs or VAEs to generate visual features for unseen classes. We propose a conditional version of generative flows for GZSL, i.e., VAE-Conditioned Generative Flow (VAE-cFlow)
arXiv Detail & Related papers (2020-09-01T09:12:31Z)
Generative Model-driven Structure Aligning Discriminative Embeddings for Transductive Zero-shot Learning [21.181715602603436]
We propose a neural network-based model for learning a projection function which aligns the visual and semantic data in the latent space. We show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and. We also show the efficacy of our model in the case of extremely less labelled data regime.
arXiv Detail & Related papers (2020-05-09T18:48:20Z)
Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes. The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
Adversarial Feature Hallucination Networks for Few-Shot Learning [84.31660118264514]
Adversarial Feature Hallucination Networks (AFHN) is based on conditional Wasserstein Generative Adversarial networks (cWGAN) Two novel regularizers are incorporated into AFHN to encourage discriminability and diversity of the synthesized features.
arXiv Detail & Related papers (2020-03-30T02:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.