On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio
Classification
- URL: http://arxiv.org/abs/2402.01274v3
- Date: Tue, 13 Feb 2024 20:21:04 GMT
- Title: On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio
Classification
- Authors: Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi
- Abstract summary: Self-supervised learning has excelled for its capacity to learn robust feature representations from unlabelled data.
This study assesses large-scale self-supervised models' performance in few-shot audio classification.
- Score: 7.83105437734593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, self-supervised learning has excelled for its capacity to
learn robust feature representations from unlabelled data. Networks pretrained
through self-supervision serve as effective feature extractors for downstream
tasks, including Few-Shot Learning. While the evaluation of unsupervised
approaches for few-shot learning is well-established in imagery, it is notably
absent in acoustics. This study addresses this gap by assessing large-scale
self-supervised models' performance in few-shot audio classification.
Additionally, we explore the relationship between a model's few-shot learning
capability and other downstream task benchmarks. Our findings reveal
state-of-the-art performance in some few-shot problems such as
SpeechCommandsv2, as well as strong correlations between speech-based few-shot
problems and various downstream audio tasks.
Related papers
- Investigating Self-Supervised Methods for Label-Efficient Learning [27.029542823306866]
We study different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities.
We introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks.
When testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.
arXiv Detail & Related papers (2024-06-25T10:56:03Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Multi-annotator Deep Learning: A Probabilistic Framework for
Classification [2.445702550853822]
Training standard deep neural networks leads to subpar performances in multi-annotator supervised learning settings.
We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL)
A modular network architecture enables us to make varying assumptions regarding annotators' performances.
Our findings show MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators.
arXiv Detail & Related papers (2023-04-05T16:00:42Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Improving In-Context Few-Shot Learning via Self-Supervised Training [48.801037246764935]
We propose to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage.
We find that the intermediate self-supervision stage produces models that outperform strong baselines.
arXiv Detail & Related papers (2022-05-03T18:01:07Z) - Self-supervised Speaker Diarization [19.111219197011355]
This study proposes an entirely unsupervised deep-learning model for speaker diarization.
Speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker.
arXiv Detail & Related papers (2022-04-08T16:27:14Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - LiRA: Learning Visual Speech Representations from Audio through
Self-supervision [53.18768477520411]
We propose Learning visual speech Representations from Audio via self-supervision (LiRA)
Specifically, we train a ResNet+Conformer model to predict acoustic features from unlabelled visual speech.
We show that our approach significantly outperforms other self-supervised methods on the Lip Reading in the Wild dataset.
arXiv Detail & Related papers (2021-06-16T23:20:06Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.