Related papers: AudioProtoPNet: An interpretable deep learning model for bird sound classification

AudioProtoPNet: An interpretable deep learning model for bird sound classification

URL: http://arxiv.org/abs/2404.10420v2
Date: Wed, 29 May 2024 14:09:17 GMT
Title: AudioProtoPNet: An interpretable deep learning model for bird sound classification
Authors: René Heinrich, Bernhard Sick, Christoph Scholz,
Abstract summary: We present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns patterns for each bird species using spectrograms of the training data.
Score: 1.6298921134113031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions. We evaluated the performance of our model on seven different datasets representing bird species from different geographical regions. In our experiments, the model showed excellent results, achieving an average AUROC of 0.82 and an average cmAP of 0.37 across the seven datasets, making it comparable to state-of-the-art black-box models for bird sound classification. Thus, this work demonstrates that even for the challenging task of bioacoustic bird classification, powerful yet interpretable deep learning models can be developed to provide valuable insights to domain experts.

Related papers

Foundation Models for Bioacoustics -- a Comparative Review [0.9109149174920012]
We review bioacoustic foundation models by thoroughly analysing design decisions such as model architecture, pretraining scheme, and training paradigm.<n>We evaluate selected foundation models on classification tasks from the BEANS and BirdSet benchmarks.<n>Our comprehensive experimental analysis reveals that BirdMAE, trained on large-scale bird song data with a self-supervised objective, achieves the best performance on the BirdSet benchmark.
arXiv Detail & Related papers (2025-08-02T09:15:16Z)
The iNaturalist Sounds Dataset [60.157076990024606]
iNatSounds is a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide.<n>The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist.<n>We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections.
arXiv Detail & Related papers (2025-05-31T02:07:37Z)
Pretraining Language Models to Ponder in Continuous Space [50.52734567589996]
We introduce this pondering process into language models by repeatedly invoking the forward process within a single token generation step.<n>We show that the model can learn to ponder in this way through self-supervised learning, without any human annotations.
arXiv Detail & Related papers (2025-05-27T03:47:33Z)
Unsupervised outlier detection to improve bird audio dataset labels [0.0]
Non-target bird species sounds can result in dataset labeling discrepancies referred to as label noise. We present a cleaning process consisting of audio preprocessing followed by dimensionality reduction and unsupervised outlier detection.
arXiv Detail & Related papers (2025-04-25T19:04:40Z)
Can Masked Autoencoders Also Listen to Birds? [2.430300340530418]
Masked Autoencoders (MAEs) have shown competitive results in audio classification by learning rich semantic representations.<n>General-purpose models fail to generalize well when applied directly to fine-grained audio domains.<n>This work demonstrates that bridging this domain gap requires more than domain-specific pretraining data.
arXiv Detail & Related papers (2025-04-17T12:13:25Z)
A Bird Song Detector for improving bird identification through Deep Learning: a case study from Doñana [2.7924253850013416]
We develop a pipeline for automatic bird vocalization identification in Donana National Park (SW Spain) We manually annotated 461 minutes of audio from three habitats across nine locations, yielding 3,749 annotations for 34 classes. Applying the Bird Song Detector before classification improved species identification, as all classification models performed better when analyzing only the segments where birds were detected.
arXiv Detail & Related papers (2025-03-19T13:19:06Z)
Semi-supervised classification of bird vocalizations [0.0]
Changes in bird populations can indicate broader changes in ecosystems. We propose a semi-supervised acoustic bird detector to allow the detection of time-overlapping calls. It achieves a mean F0.5 score of 0.701 across 315 classes from 110 bird species on a hold-out test set.
arXiv Detail & Related papers (2025-02-19T05:31:13Z)
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics [2.2399415927517414]
$texttBirdSet$ is a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. $texttBirdSet$ surpasses AudioSet with over 6,800 recording hours from nearly 10,000 classes. We benchmark six well-known DL models in multi-label classification across three distinct training scenarios.
arXiv Detail & Related papers (2024-03-15T15:10:40Z)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer [59.57249127943914]
We present a multilingual Audio-Visual Speech Recognition model incorporating several enhancements to improve performance and audio noise robustness. We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets. Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%.
arXiv Detail & Related papers (2024-03-14T01:16:32Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Self-Supervised Learning for Few-Shot Bird Sound Classification [10.395255631261458]
Self-supervised learning (SSL) in audio holds significant potential across various domains. In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
arXiv Detail & Related papers (2023-12-25T22:33:45Z)
Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification. We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z)
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources [117.6496550359768]
This work explores recent advances in instruction-tuning language models on a range of open instruction-following datasets. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets. We evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities.
arXiv Detail & Related papers (2023-06-07T19:59:23Z)
Machine Learning-based Classification of Birds through Birdsong [0.3908842679355254]
We apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify Australian birds. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58%.
arXiv Detail & Related papers (2022-12-09T06:20:50Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
Few-shot Long-Tailed Bird Audio Recognition [3.8073142980733]
We propose a sound detection and classification pipeline to analyze soundscape recordings. Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.
arXiv Detail & Related papers (2022-06-22T04:14:25Z)
Self-supervised models of audio effectively explain human cortical responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system. We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.