Self-Supervised Learning for Few-Shot Bird Sound Classification
- URL: http://arxiv.org/abs/2312.15824v4
- Date: Fri, 9 Feb 2024 12:00:56 GMT
- Title: Self-Supervised Learning for Few-Shot Bird Sound Classification
- Authors: Ilyass Moummad and Romain Serizel and Nicolas Farrugia
- Abstract summary: Self-supervised learning (SSL) in audio holds significant potential across various domains.
In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
- Score: 10.395255631261458
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised learning (SSL) in audio holds significant potential across
various domains, particularly in situations where abundant, unlabeled data is
readily available at no cost. This is pertinent in bioacoustics, where
biologists routinely collect extensive sound datasets from the natural
environment. In this study, we demonstrate that SSL is capable of acquiring
meaningful representations of bird sounds from audio recordings without the
need for annotations. Our experiments showcase that these learned
representations exhibit the capacity to generalize to new bird species in
few-shot learning (FSL) scenarios. Additionally, we show that selecting windows
with high bird activation for self-supervised learning, using a pretrained
audio neural network, significantly enhances the quality of the learned
representations.
Related papers
- Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics [2.6740633963478095]
We explore the effectiveness of transfer learning in large-scale bird sound classification.
Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance.
We advocate for more comprehensive labeling practices within the animal sound community.
arXiv Detail & Related papers (2024-09-21T11:33:12Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Transferable Models for Bioacoustics with Human Language Supervision [0.0]
BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining.
It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
arXiv Detail & Related papers (2023-08-09T14:22:18Z) - Can Self-Supervised Neural Representations Pre-Trained on Human Speech
distinguish Animal Callers? [23.041173892976325]
Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space.
This paper explores the cross-transferability of SSL neural representations learned from human speech to analyze bio-acoustic signals.
arXiv Detail & Related papers (2023-05-23T13:06:14Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Learning Neural Acoustic Fields [110.22937202449025]
We introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene.
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs.
We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations.
arXiv Detail & Related papers (2022-04-04T17:59:37Z) - Audio Self-supervised Learning: A Survey [60.41768569891083]
Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations.
Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing.
arXiv Detail & Related papers (2022-03-02T15:58:29Z) - Sound and Visual Representation Learning with Multiple Pretraining Tasks [104.11800812671953]
Self-supervised tasks (SSL) reveal different features from the data.
This work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks.
Experiments on sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models.
arXiv Detail & Related papers (2022-01-04T09:09:38Z) - Parsing Birdsong with Deep Audio Embeddings [0.5599792629509227]
We present a semi-supervised approach to identify characteristic calls and environmental noise.
We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
arXiv Detail & Related papers (2021-08-20T14:45:44Z) - Learning from Very Few Samples: A Survey [80.06120185496403]
Few sample learning is significant and challenging in the field of machine learning.
Few sample learning algorithms typically entail hundreds or thousands of supervised samples to guarantee generalization ability.
arXiv Detail & Related papers (2020-09-06T06:13:09Z) - An Open-set Recognition and Few-Shot Learning Dataset for Audio Event
Classification in Domestic Environments [3.697508383732901]
This paper deals with the application of few-shot learning to the detection of specific and intentional acoustic events given by different types of sound alarms.
The detection of such alarms in a practical scenario can be considered an open-set recognition (OSR) problem.
This paper is aimed at poviding the audio recognition community with a carefully annotated dataset comprised of 1360 clips from 34 classes divided into pattern sounds and unwanted sounds.
arXiv Detail & Related papers (2020-02-26T15:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.