Disentangling Semantic-to-visual Confusion for Zero-shot Learning
- URL: http://arxiv.org/abs/2106.08605v1
- Date: Wed, 16 Jun 2021 08:04:11 GMT
- Title: Disentangling Semantic-to-visual Confusion for Zero-shot Learning
- Authors: Zihan Ye, Fuyuan Hu, Fan Lyu, Linyan Li, Kaizhu Huang
- Abstract summary: We develop a novel model called Disentangling Class Representation Generative Adrial Network (DCR-GAN)
Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features.
Our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
- Score: 13.610995960100869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using generative models to synthesize visual features from semantic
distribution is one of the most popular solutions to ZSL image classification
in recent years. The triplet loss (TL) is popularly used to generate realistic
visual distributions from semantics by automatically searching discriminative
representations. However, the traditional TL cannot search reliable unseen
disentangled representations due to the unavailability of unseen classes in
ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet
loss (MMTL) which utilizes multimodal information to search a disentangled
representation space. As such, all classes can interplay which can benefit
learning disentangled class representations in the searched space. Furthermore,
we develop a novel model called Disentangling Class Representation Generative
Adversarial Network (DCR-GAN) focusing on exploiting the disentangled
representations in training, feature synthesis, and final recognition stages.
Benefiting from the disentangled representations, DCR-GAN could fit a more
realistic distribution over both seen and unseen features. Extensive
experiments show that our proposed model can lead to superior performance to
the state-of-the-arts on four benchmark datasets. Our code is available at
https://github.com/FouriYe/DCRGAN-TMM.
Related papers
- Discriminative Image Generation with Diffusion Models for Zero-Shot Learning [53.44301001173801]
We present DIG-ZSL, a novel Discriminative Image Generation framework for Zero-Shot Learning.
We learn a discriminative class token (DCT) for each unseen class under the guidance of a pre-trained category discrimination model (CDM)
In this paper, the extensive experiments and visualizations on four datasets show that our DIG-ZSL: (1) generates diverse and high-quality images, (2) outperforms previous state-of-the-art nonhuman-annotated semantic prototype-based methods by a large margin, and (3) achieves comparable or better performance than baselines that leverage human-annot
arXiv Detail & Related papers (2024-12-23T02:18:54Z) - ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning [38.36200871549062]
A scarcity of seen class samples results in a marked decrease in performance across many generative Zero-shot Learning techniques.
We introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations.
ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data.
arXiv Detail & Related papers (2024-06-05T04:37:06Z) - SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized Zero-Shot Learning [0.6792605600335813]
Zero-Shot Learning (ZSL) presents the challenge of identifying categories not seen during training.
We introduce a Semantic-Enhanced Representations for Zero-Shot Learning (SEER-ZSL)
First, we aim to distill meaningful semantic information using a probabilistic encoder, enhancing the semantic consistency and robustness.
Second, we distill the visual space by exploiting the learned data distribution through an adversarially trained generator. Third, we align the distilled information, enabling a mapping of unseen categories onto the true data manifold.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - Renderers are Good Zero-Shot Representation Learners: Exploring
Diffusion Latents for Metric Learning [1.0152838128195467]
We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E.
We find that Shap-E representations outperform those of the classical EfficientNet baseline representations zero-shot.
These findings give preliminary indication that 3D-based rendering and generative models can yield useful representations for discriminative tasks in our innately 3D-native world.
arXiv Detail & Related papers (2023-06-19T06:41:44Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - High Fidelity Visualization of What Your Self-Supervised Representation
Knows About [22.982471878833362]
In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models.
We demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning.
arXiv Detail & Related papers (2021-12-16T19:23:33Z) - FREE: Feature Refinement for Generalized Zero-Shot Learning [86.41074134041394]
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias.
Most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks.
We propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE) to tackle the above problem.
arXiv Detail & Related papers (2021-07-29T08:11:01Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.