Related papers: Prototypical Contrastive Learning For Improved Few-Shot Audio Classification

Prototypical Contrastive Learning For Improved Few-Shot Audio Classification

URL: http://arxiv.org/abs/2509.10074v1
Date: Fri, 12 Sep 2025 09:10:55 GMT
Title: Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
Authors: Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis, Theodoros Giannakopoulos,
Abstract summary: Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data.<n>In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification.
Score: 3.100682063199351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.

Related papers

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR)<n>MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules.<n>To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z)
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification [7.83105437734593]
Self-supervised learning has excelled for its capacity to learn robust feature representations from unlabelled data. This study assesses large-scale self-supervised models' performance in few-shot audio classification.
arXiv Detail & Related papers (2024-02-02T10:00:51Z)
Learning with Noisy Labels through Learnable Weighting and Centroid Similarity [5.187216033152917]
noisy labels are prevalent in domains such as medical diagnosis and autonomous driving. We introduce a novel method for training machine learning models in the presence of noisy labels. Our results show that our method consistently outperforms the existing state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-16T16:43:24Z)
Convolutional Ensembling based Few-Shot Defect Detection Technique [0.0]
We present a new approach to few-shot classification, where we employ the knowledge-base of multiple pre-trained convolutional models. Our framework uses a novel ensembling technique for boosting the accuracy while drastically decreasing the total parameter count.
arXiv Detail & Related papers (2022-08-05T17:29:14Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
Few-Shot Learning with Part Discovery and Augmentation from Unlabeled Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes. Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations. Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z)
A Framework using Contrastive Learning for Classification with Noisy Labels [1.2891210250935146]
We propose a framework using contrastive learning as a pre-training task to perform image classification in the presence of noisy labels. Recent strategies such as pseudo-labeling, sample selection with Gaussian Mixture models, weighted supervised contrastive learning have been combined into a fine-tuning phase following the pre-training.
arXiv Detail & Related papers (2021-04-19T18:51:22Z)
Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency) Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning [58.2091760793799]
We propose a novel contrastive prototype learning with augmented embeddings (CPLAE) model. With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away. Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.
arXiv Detail & Related papers (2021-01-23T13:22:44Z)
Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective. Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.