Identifying birdsong syllables without labelled data
- URL: http://arxiv.org/abs/2509.18412v1
- Date: Mon, 22 Sep 2025 20:54:37 GMT
- Title: Identifying birdsong syllables without labelled data
- Authors: Mélisande Teng, Julien Boussard, David Rolnick, Hugo Larochelle,
- Abstract summary: We build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables.<n>We evaluate our automatic annotations against human labels on a dataset of Bengalese finch songs.
- Score: 23.41137933942656
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Identifying sequences of syllables within birdsongs is key to tackling a wide array of challenges, including bird individual identification and better understanding of animal communication and sensory-motor learning. Recently, machine learning approaches have demonstrated great potential to alleviate the need for experts to label long audio recordings by hand. However, they still typically rely on the availability of labelled data for model training, restricting applicability to a few species and datasets. In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables. We first detect syllable events, then cluster them to extract templates --syllable representations-- before performing matching pursuit to decompose the recording as a sequence of syllables. We evaluate our automatic annotations against human labels on a dataset of Bengalese finch songs and find that our unsupervised method achieves high performance. We also demonstrate that our approach can distinguish individual birds within a species through their unique vocal signatures, for both Bengalese finches and another species, the great tit.
Related papers
- Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis [2.6084563319562784]
This work presents a lightweight, yet performant neural network architecture for birdsong annotation called Residual-MLP-RNN.<n>It presents a robust three-stage training pipeline for developing reliable deep birdsong syllable detectors with minimal expert labor.<n>The performance of this data-efficient approach is demonstrated for the complex song of the Canary in extreme label-scarcity scenarios.
arXiv Detail & Related papers (2025-11-15T11:04:01Z) - The iNaturalist Sounds Dataset [60.157076990024606]
iNatSounds is a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide.<n>The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist.<n>We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections.
arXiv Detail & Related papers (2025-05-31T02:07:37Z) - Unsupervised outlier detection to improve bird audio dataset labels [0.0]
Non-target bird species sounds can result in dataset labeling discrepancies referred to as label noise.<n>We present a cleaning process consisting of audio preprocessing followed by dimensionality reduction and unsupervised outlier detection.
arXiv Detail & Related papers (2025-04-25T19:04:40Z) - An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon [0.6282171844772422]
This paper presents an automated one-shot bird call classification pipeline designed for rare species absent from large publicly available classifiers like BirdNET and Perch.<n>We leverage the embedding space of large bird classification networks and develop a classifier using cosine similarity, combined with filtering and denoising preprocessing techniques.<n>The final model achieved 1.0 recall and 0.95 accuracy in detecting tooth-billed pigeon calls, making it practical for use in the field.
arXiv Detail & Related papers (2025-04-22T21:21:41Z) - A Bird Song Detector for improving bird identification through Deep Learning: a case study from Doñana [2.7924253850013416]
A key challenge in bird species identification is that many recordings lack target species or contain overlapping vocalizations.<n>We developed a multi-stage pipeline for automatic bird vocalization identification in Donana National Park (SW Spain)<n>We first applied a Bird Song Detector to isolate bird vocalizations using spectrogram-based image processing. Then, species were classified using custom models trained at the local scale.
arXiv Detail & Related papers (2025-03-19T13:19:06Z) - CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics.
CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z) - Unsupervised 3D registration through optimization-guided cyclical
self-training [71.75057371518093]
State-of-the-art deep learning-based registration methods employ three different learning strategies.
We propose a novel self-supervised learning paradigm for unsupervised registration, relying on self-training.
We evaluate the method for abdomen and lung registration, consistently surpassing metric-based supervision and outperforming diverse state-of-the-art competitors.
arXiv Detail & Related papers (2023-06-29T14:54:10Z) - Unsupervised classification to improve the quality of a bird song
recording dataset [0.0]
We introduce a data-centric novel labelling function composed of three successive steps: time-frequency sound unit segmentation, feature computation for each sound unit, and classification of each sound unit as bird song or noise.
Our labelling function was able to significantly reduce the initial label noise present in the dataset by up to a factor of three.
arXiv Detail & Related papers (2023-02-15T10:01:58Z) - Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles.
To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio.
We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.