Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
- URL: http://arxiv.org/abs/2409.15383v1
- Date: Sat, 21 Sep 2024 11:33:12 GMT
- Title: Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
- Authors: Burooj Ghani, Vincent J. Kalkman, Bob Planqué, Willem-Pier Vellinga, Lisa Gill, Dan Stowell,
- Abstract summary: We explore the effectiveness of transfer learning in large-scale bird sound classification.
Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance.
We advocate for more comprehensive labeling practices within the animal sound community.
- Score: 2.6740633963478095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Animal sounds can be recognised automatically by machine learning, and this has an important role to play in biodiversity monitoring. Yet despite increasingly impressive capabilities, bioacoustic species classifiers still exhibit imbalanced performance across species and habitats, especially in complex soundscapes. In this study, we explore the effectiveness of transfer learning in large-scale bird sound classification across various conditions, including single- and multi-label scenarios, and across different model architectures such as CNNs and Transformers. Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance, with cross-distillation proving particularly effective in improving in-domain performance on Xeno-canto data. However, when generalizing to soundscapes, shallow fine-tuning exhibits superior performance compared to knowledge distillation, highlighting its robustness and constrained nature. Our study further investigates how to use multi-species labels, in cases where these are present but incomplete. We advocate for more comprehensive labeling practices within the animal sound community, including annotating background species and providing temporal details, to enhance the training of robust bird sound classifiers. These findings provide insights into the optimal reuse of pretrained models for advancing automatic bioacoustic recognition.
Related papers
- Advanced Framework for Animal Sound Classification With Features Optimization [35.2832738406242]
We propose an automated classification framework applicable to general animal sound classification.
Our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy.
arXiv Detail & Related papers (2024-07-03T18:33:47Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - Self-Supervised Learning for Few-Shot Bird Sound Classification [10.395255631261458]
Self-supervised learning (SSL) in audio holds significant potential across various domains.
In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
arXiv Detail & Related papers (2023-12-25T22:33:45Z) - A Saliency-based Clustering Framework for Identifying Aberrant
Predictions [49.1574468325115]
We introduce the concept of aberrant predictions, emphasizing that the nature of classification errors is as critical as their frequency.
We propose a novel, efficient training methodology aimed at both reducing the misclassification rate and discerning aberrant predictions.
We apply this methodology to the less-explored domain of veterinary radiology, where the stakes are high but have not been as extensively studied compared to human medicine.
arXiv Detail & Related papers (2023-11-11T01:53:59Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Long-tail Recognition via Compositional Knowledge Transfer [60.03764547406601]
We introduce a novel strategy for long-tail recognition that addresses the tail classes' few-shot problem.
Our objective is to transfer knowledge acquired from information-rich common classes to semantically similar, and yet data-hungry, rare classes.
Experiments show that our approach can achieve significant performance boosts on rare classes while maintaining robust common class performance.
arXiv Detail & Related papers (2021-12-13T15:48:59Z) - Parsing Birdsong with Deep Audio Embeddings [0.5599792629509227]
We present a semi-supervised approach to identify characteristic calls and environmental noise.
We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
arXiv Detail & Related papers (2021-08-20T14:45:44Z) - Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge.
We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z) - Class-Balanced Distillation for Long-Tailed Visual Recognition [100.10293372607222]
Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions.
In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting.
Our main contribution is a new training method, that leverages knowledge distillation to enhance feature representations.
arXiv Detail & Related papers (2021-04-12T08:21:03Z) - Modelling Animal Biodiversity Using Acoustic Monitoring and Deep
Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals.
The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP)
Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z) - Transferring Dense Pose to Proximal Animal Classes [83.84439508978126]
We show that it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes.
We do this by establishing a DensePose model for the new animal which is also geometrically aligned to humans.
We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach.
arXiv Detail & Related papers (2020-02-28T21:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.