Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with
Transformers
- URL: http://arxiv.org/abs/2308.07121v2
- Date: Tue, 21 Nov 2023 13:55:04 GMT
- Title: Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with
Transformers
- Authors: Lukas Rauch, Raphael Schwinger, Moritz Wirth, Bernhard Sick, Sven
Tomforde, Christoph Scholz
- Abstract summary: We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL)
We aim to bypass traditional spectrogram conversions, enabling direct raw audio processing.
- Score: 2.404305970432934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a shift towards end-to-end learning in bird sound monitoring by
combining self-supervised (SSL) and deep active learning (DAL). Leveraging
transformer models, we aim to bypass traditional spectrogram conversions,
enabling direct raw audio processing. ActiveBird2Vec is set to generate
high-quality bird sound representations through SSL, potentially accelerating
the assessment of environmental changes and decision-making processes for wind
farms. Additionally, we seek to utilize the wide variety of bird vocalizations
through DAL, reducing the reliance on extensively labeled datasets by human
experts. We plan to curate a comprehensive set of tasks through Huggingface
Datasets, enhancing future comparability and reproducibility of bioacoustic
research. A comparative analysis between various transformer models will be
conducted to evaluate their proficiency in bird sound recognition tasks. We aim
to accelerate the progression of avian bioacoustic research and contribute to
more effective conservation strategies.
Related papers
- Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models.
We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z) - Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics [2.6740633963478095]
We explore the effectiveness of transfer learning in large-scale bird sound classification.
Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance.
We advocate for more comprehensive labeling practices within the animal sound community.
arXiv Detail & Related papers (2024-09-21T11:33:12Z) - Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion [93.32354378820648]
We introduce MVSD, a mutual learning framework based on diffusion models.
MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks.
Our framework can improve the performance of the reverberator and dereverberator.
arXiv Detail & Related papers (2024-07-15T00:47:56Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Few-shot Long-Tailed Bird Audio Recognition [3.8073142980733]
We propose a sound detection and classification pipeline to analyze soundscape recordings.
Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.
arXiv Detail & Related papers (2022-06-22T04:14:25Z) - Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy.
Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification.
We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z) - Parsing Birdsong with Deep Audio Embeddings [0.5599792629509227]
We present a semi-supervised approach to identify characteristic calls and environmental noise.
We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
arXiv Detail & Related papers (2021-08-20T14:45:44Z) - Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge.
We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Modelling Animal Biodiversity Using Acoustic Monitoring and Deep
Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals.
The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP)
Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.