Can Masked Autoencoders Also Listen to Birds?
- URL: http://arxiv.org/abs/2504.12880v1
- Date: Thu, 17 Apr 2025 12:13:25 GMT
- Title: Can Masked Autoencoders Also Listen to Birds?
- Authors: Lukas Rauch, Ilyass Moummad, René Heinrich, Alexis Joly, Bernhard Sick, Christoph Scholz,
- Abstract summary: Masked Autoencoders (MAEs) pretrained on AudioSet fail to capture the fine-grained acoustic characteristics of specialized domains.<n>We introduce Bird-MAE, a domain-specialized MAE pretrained on the large-scale BirdSet dataset.
- Score: 2.430300340530418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked Autoencoders (MAEs) pretrained on AudioSet fail to capture the fine-grained acoustic characteristics of specialized domains such as bioacoustic monitoring. Bird sound classification is critical for assessing environmental health, yet general-purpose models inadequately address its unique acoustic challenges. To address this, we introduce Bird-MAE, a domain-specialized MAE pretrained on the large-scale BirdSet dataset. We explore adjustments to pretraining, fine-tuning and utilizing frozen representations. Bird-MAE achieves state-of-the-art results across all BirdSet downstream tasks, substantially improving multi-label classification performance compared to the general-purpose Audio-MAE baseline. Additionally, we propose prototypical probing, a parameter-efficient method for leveraging MAEs' frozen representations. Bird-MAE's prototypical probes outperform linear probing by up to 37\% in MAP and narrow the gap to fine-tuning to approximately 3\% on average on BirdSet.
Related papers
- An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon [0.6282171844772422]
This paper presents an automated one-shot bird call classification pipeline designed for rare species absent from large publicly available classifiers like BirdNET and Perch.
We leverage the embedding space of large bird classification networks and develop a classifier using cosine similarity, combined with filtering and denoising preprocessing techniques.
The final model achieved 1.0 recall and 0.95 accuracy in detecting tooth-billed pigeon calls, making it practical for use in the field.
arXiv Detail & Related papers (2025-04-22T21:21:41Z) - A Bird Song Detector for improving bird identification through Deep Learning: a case study from Doñana [2.7924253850013416]
We develop a pipeline for automatic bird vocalization identification in Donana National Park (SW Spain)<n>We manually annotated 461 minutes of audio from three habitats across nine locations, yielding 3,749 annotations for 34 classes.<n>Applying the Bird Song Detector before classification improved species identification, as all classification models performed better when analyzing only the segments where birds were detected.
arXiv Detail & Related papers (2025-03-19T13:19:06Z) - NBM: an Open Dataset for the Acoustic Monitoring of Nocturnal Migratory Birds in Europe [0.0]
This work presents the Nocturnal Bird Migration dataset, a collection of 13,359 annotated vocalizations from 117 species of the Western Palearctic.<n>The dataset includes precise time and frequency annotations, gathered by dozens of bird enthusiasts across France.<n>In particular, we prove the utility of this database by training an original two-stage deep object detection model tailored for the processing of audio data.
arXiv Detail & Related papers (2024-12-04T18:55:45Z) - Advanced Framework for Animal Sound Classification With Features Optimization [35.2832738406242]
We propose an automated classification framework applicable to general animal sound classification.
Our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy.
arXiv Detail & Related papers (2024-07-03T18:33:47Z) - BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics [2.2399415927517414]
textttBirdSet is a large-scale benchmark dataset for audio classification focusing on avian bioacoustics.<n>textttBirdSet surpasses AudioSet with over 6,800 recording hours($uparrow!17%$) from nearly 10,000 classes($uparrow!18times$) for training and more than 400 hours($uparrow!7times$) across eight strongly labeled evaluation datasets.
arXiv Detail & Related papers (2024-03-15T15:10:40Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with
Transformers [2.404305970432934]
We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL)
We aim to bypass traditional spectrogram conversions, enabling direct raw audio processing.
arXiv Detail & Related papers (2023-08-14T13:06:10Z) - ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [65.58562481279023]
We propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection.
We evaluate our paradigm on a diverse model zoo consisting of 35 models for various Out-of-Distribution (OoD) tasks.
arXiv Detail & Related papers (2022-10-17T16:31:57Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.