Related papers: ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning

ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning

URL: http://arxiv.org/abs/2602.12401v1
Date: Thu, 12 Feb 2026 20:52:07 GMT
Title: ZeroDiff++: Substantial Unseen Visual-semantic Correlation in Zero-shot Learning
Authors: Zihan Ye, Shreyank N Gowda, Kaile Du, Weijian Luo, Ling Shao,
Abstract summary: We introduce metrics to quantify spuriousness for seen and unseen classes.<n>We propose ZeroDiff++, a diffusion-based generative framework.<n>Experiments on three ZSL benchmarks demonstrate that ZeroDiff++ not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data.
Score: 42.855022683292276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot Learning (ZSL) enables classifiers to recognize classes unseen during training, commonly via generative two stage methods: (1) learn visual semantic correlations from seen classes; (2) synthesize unseen class features from semantics to train classifiers. In this paper, we identify spurious visual semantic correlations in existing generative ZSL worsened by scarce seen class samples and introduce two metrics to quantify spuriousness for seen and unseen classes. Furthermore, we point out a more critical bottleneck: existing unadaptive fully noised generators produce features disconnected from real test samples, which also leads to the spurious correlation. To enhance the visual-semantic correlations on both seen and unseen classes, we propose ZeroDiff++, a diffusion-based generative framework. In training, ZeroDiff++ uses (i) diffusion augmentation to produce diverse noised samples, (ii) supervised contrastive (SC) representations for instance level semantics, and (iii) multi view discriminators with Wasserstein mutual learning to assess generated features. At generation time, we introduce (iv) Diffusion-based Test time Adaptation (DiffTTA) to adapt the generator using pseudo label reconstruction, and (v) Diffusion-based Test time Generation (DiffGen) to trace the diffusion denoising path and produce partially synthesized features that connect real and generated data, and mitigates data scarcity further. Extensive experiments on three ZSL benchmarks demonstrate that ZeroDiff++ not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Code would be available.

Related papers

ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning [38.36200871549062]
A scarcity of seen class samples results in a marked decrease in performance across many generative Zero-shot Learning techniques.<n>We introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations.<n>ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data.
arXiv Detail & Related papers (2024-06-05T04:37:06Z)
Resolving Semantic Confusions for Improved Zero-Shot Detection [6.72910827751713]
We propose a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes. A cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics.
arXiv Detail & Related papers (2022-12-12T18:11:48Z)
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
Zero-Shot Logit Adjustment [89.68803484284408]
Generalized Zero-Shot Learning (GZSL) is a semantic-descriptor-based learning technique. In this paper, we propose a new generation-based technique to enhance the generator's effect while neglecting the improvement of the classifier. Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks.
arXiv Detail & Related papers (2022-04-25T17:54:55Z)
Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z)
Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z)
Generalized Zero-Shot Learning via VAE-Conditioned Generative Flow [83.27681781274406]
Generalized zero-shot learning aims to recognize both seen and unseen classes by transferring knowledge from semantic descriptions to visual representations. Recent generative methods formulate GZSL as a missing data problem, which mainly adopts GANs or VAEs to generate visual features for unseen classes. We propose a conditional version of generative flows for GZSL, i.e., VAE-Conditioned Generative Flow (VAE-cFlow)
arXiv Detail & Related papers (2020-09-01T09:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.