AudioProtoPNet: An interpretable deep learning model for bird sound classification
- URL: http://arxiv.org/abs/2404.10420v2
- Date: Wed, 29 May 2024 14:09:17 GMT
- Title: AudioProtoPNet: An interpretable deep learning model for bird sound classification
- Authors: René Heinrich, Bernhard Sick, Christoph Scholz,
- Abstract summary: We present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture.
Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns patterns for each bird species using spectrograms of the training data.
- Score: 1.6298921134113031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions. We evaluated the performance of our model on seven different datasets representing bird species from different geographical regions. In our experiments, the model showed excellent results, achieving an average AUROC of 0.82 and an average cmAP of 0.37 across the seven datasets, making it comparable to state-of-the-art black-box models for bird sound classification. Thus, this work demonstrates that even for the challenging task of bioacoustic bird classification, powerful yet interpretable deep learning models can be developed to provide valuable insights to domain experts.
Related papers
- Semi-supervised classification of bird vocalizations [0.0]
Changes in bird populations can indicate broader changes in ecosystems.
We propose a semi-supervised acoustic bird detector to allow the detection of time-overlapping calls.
It achieves a mean F0.5 score of 0.701 across 315 classes from 110 bird species on a hold-out test set.
arXiv Detail & Related papers (2025-02-19T05:31:13Z) - BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics [2.2399415927517414]
textttBirdSet is a large-scale benchmark dataset for audio classification focusing on avian bioacoustics.
textttBirdSet surpasses AudioSet with over 6,800 recording hours($uparrow!17%$) from nearly 10,000 classes($uparrow!18times$) for training and more than 400 hours($uparrow!7times$) across eight strongly labeled evaluation datasets.
arXiv Detail & Related papers (2024-03-15T15:10:40Z) - Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer [59.57249127943914]
We present a multilingual Audio-Visual Speech Recognition model incorporating several enhancements to improve performance and audio noise robustness.
We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets.
Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%.
arXiv Detail & Related papers (2024-03-14T01:16:32Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Self-Supervised Learning for Few-Shot Bird Sound Classification [10.395255631261458]
Self-supervised learning (SSL) in audio holds significant potential across various domains.
In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
arXiv Detail & Related papers (2023-12-25T22:33:45Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - How Far Can Camels Go? Exploring the State of Instruction Tuning on Open
Resources [117.6496550359768]
This work explores recent advances in instruction-tuning language models on a range of open instruction-following datasets.
We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets.
We evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities.
arXiv Detail & Related papers (2023-06-07T19:59:23Z) - Machine Learning-based Classification of Birds through Birdsong [0.3908842679355254]
We apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify Australian birds.
We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study.
Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58%.
arXiv Detail & Related papers (2022-12-09T06:20:50Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Few-shot Long-Tailed Bird Audio Recognition [3.8073142980733]
We propose a sound detection and classification pipeline to analyze soundscape recordings.
Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.
arXiv Detail & Related papers (2022-06-22T04:14:25Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.