MetaAudio: A Few-Shot Audio Classification Benchmark
- URL: http://arxiv.org/abs/2204.02121v1
- Date: Tue, 5 Apr 2022 11:33:44 GMT
- Title: MetaAudio: A Few-Shot Audio Classification Benchmark
- Authors: Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi
- Abstract summary: This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative.
We compare the few-shot classification performance of a variety of techniques on seven audio datasets.
Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods.
- Score: 2.294014185517203
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Currently available benchmarks for few-shot learning (machine learning with
few training examples) are limited in the domains they cover, primarily
focusing on image classification. This work aims to alleviate this reliance on
image-based benchmarks by offering the first comprehensive, public and fully
reproducible audio based alternative, covering a variety of sound domains and
experimental settings. We compare the few-shot classification performance of a
variety of techniques on seven audio datasets (spanning environmental sounds to
human-speech). Extending this, we carry out in-depth analyses of joint training
(where all datasets are used during training) and cross-dataset adaptation
protocols, establishing the possibility of a generalised audio few-shot
classification algorithm. Our experimentation shows gradient-based
meta-learning methods such as MAML and Meta-Curvature consistently outperform
both metric and baseline methods. We also demonstrate that the joint training
routine helps overall generalisation for the environmental sound databases
included, as well as being a somewhat-effective method of tackling the
cross-dataset/domain setting.
Related papers
- LC-Protonets: Multi-label Few-shot learning for world music audio tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.
LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.
Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - Benchmarking Representations for Speech, Music, and Acoustic Events [24.92641211471113]
ARCH is a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains.
ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes.
To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets.
arXiv Detail & Related papers (2024-05-02T01:24:53Z) - Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol [6.749750044497733]
We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities.
By applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them.
Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection.
arXiv Detail & Related papers (2024-05-01T08:30:58Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - Audio-Visual Scene Classification Using A Transfer Learning Based Joint
Optimization Strategy [26.975596225131824]
We propose a joint training framework, using the acoustic features and raw images directly as inputs for the AVSC task.
Specifically, we retrieve the bottom layers of pre-trained image models as visual encoder, and jointly optimize the scene classifier and 1D-CNN based acoustic encoder during training.
arXiv Detail & Related papers (2022-04-25T03:37:02Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning [79.25478727351604]
We explore a simple process: meta-learning over a whole-classification pre-trained model on its evaluation metric.
We observe this simple method achieves competitive performance to state-of-the-art methods on standard benchmarks.
arXiv Detail & Related papers (2020-03-09T20:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.