Related papers: Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

URL: http://arxiv.org/abs/2308.07121v2
Date: Tue, 21 Nov 2023 13:55:04 GMT
Title: Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers
Authors: Lukas Rauch, Raphael Schwinger, Moritz Wirth, Bernhard Sick, Sven Tomforde, Christoph Scholz
Abstract summary: We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL) We aim to bypass traditional spectrogram conversions, enabling direct raw audio processing.
Score: 2.404305970432934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ActiveBird2Vec is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.

Related papers

Can Masked Autoencoders Also Listen to Birds? [2.430300340530418]
Masked Autoencoders (MAEs) pretrained on AudioSet fail to capture the fine-grained acoustic characteristics of specialized domains. We introduce Bird-MAE, a domain-specialized MAE pretrained on the large-scale BirdSet dataset.
arXiv Detail & Related papers (2025-04-17T12:13:25Z)
Decoding Poultry Vocalizations -- Natural Language Processing and Transformer Models for Semantic and Emotional Analysis [0.0]
Deciphering the acoustic language of chickens offers new opportunities in animal welfare and ecological informatics. We apply advanced Natural Language Processing and transformer based models to translate bioacoustic data into meaningful insights. This pipeline decodes poultry vocalizations into interpretable categories including distress calls, feeding signals, and mating vocalizations.
arXiv Detail & Related papers (2024-12-11T06:44:32Z)
Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z)
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics [2.6740633963478095]
We explore the effectiveness of transfer learning in large-scale bird sound classification. Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance. We advocate for more comprehensive labeling practices within the animal sound community.
arXiv Detail & Related papers (2024-09-21T11:33:12Z)
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion [93.32354378820648]
We introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks. Our framework can improve the performance of the reverberator and dereverberator.
arXiv Detail & Related papers (2024-07-15T00:47:56Z)
Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification. We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z)
Few-shot Long-Tailed Bird Audio Recognition [3.8073142980733]
We propose a sound detection and classification pipeline to analyze soundscape recordings. Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.
arXiv Detail & Related papers (2022-06-22T04:14:25Z)
Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy. Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification. We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z)
Parsing Birdsong with Deep Audio Embeddings [0.5599792629509227]
We present a semi-supervised approach to identify characteristic calls and environmental noise. We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
arXiv Detail & Related papers (2021-08-20T14:45:44Z)
Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge. We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z)
Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task. We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z)
Discriminative Singular Spectrum Classifier with Applications on Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently. Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces. The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z)
Modelling Animal Biodiversity Using Acoustic Monitoring and Deep Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals. The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP) Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.