Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
- URL: http://arxiv.org/abs/2503.22197v1
- Date: Fri, 28 Mar 2025 07:28:56 GMT
- Title: Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
- Authors: Yang Liu, Xun Zhang, Jiale Du, Xinbo Gao, Jungong Han,
- Abstract summary: Zero-shot Learning attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information.<n>In this paper, we propose an extremely simple Out-of-distribution(OOD) detection based AV-GZSL method(EZ-AVOOD) to mitigate bias problem.<n>Compared to existing state-of-the-art methods, our model superior ZSL and GZSL performances on three audio-visual datasets.
- Score: 84.02184773383732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot Learning(ZSL) attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information, which is a promising yet difficult research topic. In this field, Audio-Visual Generalized Zero-Shot Learning~(AV-GZSL) has aroused researchers' great interest in which intricate relations within triple modalities~(audio, video, and natural language) render this task quite challenging but highly research-worthy. However, both existing embedding-based and generative-based AV-GZSL methods tend to suffer from domain shift problem a lot and we propose an extremely simple Out-of-distribution~(OOD) detection based AV-GZSL method~(EZ-AVOOD) to further mitigate bias problem by differentiating seen and unseen samples at the initial beginning. EZ-AVOOD accomplishes effective seen-unseen separation by exploiting the intrinsic discriminative information held in class-specific logits and class-agnostic feature subspace without training an extra OOD detector network. Followed by seen-unseen binary classification, we employ two expert models to classify seen samples and unseen samples separately. Compared to existing state-of-the-art methods, our model achieves superior ZSL and GZSL performances on three audio-visual datasets and becomes the new SOTA, which comprehensively demonstrates the effectiveness of the proposed EZ-AVOOD.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework [0.0]
Generalized Zero-Shot Learning (GZSL) is a challenging task requiring accurate classification of both seen and unseen classes.
We introduce a general framework employing out-of-distribution (OOD) detection, aiming to harness the strengths of both approaches.
We test our framework on three popular audio-visual datasets and observe a significant improvement comparing to existing state-of-the-art works.
arXiv Detail & Related papers (2024-08-02T14:10:20Z) - Zero-Shot Learning by Harnessing Adversarial Samples [52.09717785644816]
We propose a novel Zero-Shot Learning (ZSL) approach by Harnessing Adversarial Samples (HAS)
HAS advances ZSL through adversarial training which takes into account three crucial aspects.
We demonstrate the effectiveness of our adversarial samples approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios.
arXiv Detail & Related papers (2023-08-01T06:19:13Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning [11.66422653137002]
We propose an attention-based model in the problem settings of Zero-Shot Learning to learn attributes useful for unseen class recognition.
Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches.
arXiv Detail & Related papers (2021-07-30T19:08:44Z) - Hardness Sampling for Self-Training Based Transductive Zero-Shot
Learning [10.764160559530847]
Transductive zero-shot learning (T-ZSL) which could alleviate the domain shift problem in existing ZSL works, has received much attention recently.
We first empirically analyze the roles of unseen-class samples with different degrees of hardness in the training process.
We propose two hardness sampling approaches for selecting a subset of diverse and hard samples from a given unseen-class dataset.
arXiv Detail & Related papers (2021-06-01T06:55:19Z) - Adversarial Self-Supervised Learning for Semi-Supervised 3D Action
Recognition [123.62183172631443]
We present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme.
Specifically, we design an effective SSL scheme to improve the discrimination capability of learned representations for 3D action recognition.
arXiv Detail & Related papers (2020-07-12T08:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.