Disentangling Semantic-to-visual Confusion for Zero-shot Learning
- URL: http://arxiv.org/abs/2106.08605v1
- Date: Wed, 16 Jun 2021 08:04:11 GMT
- Title: Disentangling Semantic-to-visual Confusion for Zero-shot Learning
- Authors: Zihan Ye, Fuyuan Hu, Fan Lyu, Linyan Li, Kaizhu Huang
- Abstract summary: We develop a novel model called Disentangling Class Representation Generative Adrial Network (DCR-GAN)
Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features.
Our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
- Score: 13.610995960100869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using generative models to synthesize visual features from semantic
distribution is one of the most popular solutions to ZSL image classification
in recent years. The triplet loss (TL) is popularly used to generate realistic
visual distributions from semantics by automatically searching discriminative
representations. However, the traditional TL cannot search reliable unseen
disentangled representations due to the unavailability of unseen classes in
ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet
loss (MMTL) which utilizes multimodal information to search a disentangled
representation space. As such, all classes can interplay which can benefit
learning disentangled class representations in the searched space. Furthermore,
we develop a novel model called Disentangling Class Representation Generative
Adversarial Network (DCR-GAN) focusing on exploiting the disentangled
representations in training, feature synthesis, and final recognition stages.
Benefiting from the disentangled representations, DCR-GAN could fit a more
realistic distribution over both seen and unseen features. Extensive
experiments show that our proposed model can lead to superior performance to
the state-of-the-arts on four benchmark datasets. Our code is available at
https://github.com/FouriYe/DCRGAN-TMM.
Related papers
- Towards Generative Class Prompt Learning for Fine-grained Visual Recognition [5.633314115420456]
Generative Class Prompt Learning and Contrastive Multi-class Prompt Learning are presented.
Generative Class Prompt Learning improves visio-linguistic synergy in class embeddings by conditioning on few-shot exemplars with learnable class prompts.
CoMPLe builds on this foundation by introducing a contrastive learning component that encourages inter-class separation.
arXiv Detail & Related papers (2024-09-03T12:34:21Z) - RevCD -- Reversed Conditional Diffusion for Generalized Zero-Shot Learning [0.6792605600335813]
In computer vision, knowledge from seen categories is transferred to unseen categories by exploiting the relationships between visual features and available semantic information.
We present a reversed conditional Diffusion-based model (RevCD) that mitigates this issue by generating semantic features from visual inputs.
Our RevCD model consists of a cross Hadamard-Addition embedding of a sinusoidal time schedule and a multi-headed visual transformer for attention-guided embeddings.
arXiv Detail & Related papers (2024-08-31T17:37:26Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Renderers are Good Zero-Shot Representation Learners: Exploring
Diffusion Latents for Metric Learning [1.0152838128195467]
We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E.
We find that Shap-E representations outperform those of the classical EfficientNet baseline representations zero-shot.
These findings give preliminary indication that 3D-based rendering and generative models can yield useful representations for discriminative tasks in our innately 3D-native world.
arXiv Detail & Related papers (2023-06-19T06:41:44Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - High Fidelity Visualization of What Your Self-Supervised Representation
Knows About [22.982471878833362]
In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models.
We demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning.
arXiv Detail & Related papers (2021-12-16T19:23:33Z) - FREE: Feature Refinement for Generalized Zero-Shot Learning [86.41074134041394]
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias.
Most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks.
We propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE) to tackle the above problem.
arXiv Detail & Related papers (2021-07-29T08:11:01Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.