Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with
Transformers
- URL: http://arxiv.org/abs/2308.07121v2
- Date: Tue, 21 Nov 2023 13:55:04 GMT
- Title: Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with
Transformers
- Authors: Lukas Rauch, Raphael Schwinger, Moritz Wirth, Bernhard Sick, Sven
Tomforde, Christoph Scholz
- Abstract summary: We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL)
We aim to bypass traditional spectrogram conversions, enabling direct raw audio processing.
- Score: 2.404305970432934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a shift towards end-to-end learning in bird sound monitoring by
combining self-supervised (SSL) and deep active learning (DAL). Leveraging
transformer models, we aim to bypass traditional spectrogram conversions,
enabling direct raw audio processing. ActiveBird2Vec is set to generate
high-quality bird sound representations through SSL, potentially accelerating
the assessment of environmental changes and decision-making processes for wind
farms. Additionally, we seek to utilize the wide variety of bird vocalizations
through DAL, reducing the reliance on extensively labeled datasets by human
experts. We plan to curate a comprehensive set of tasks through Huggingface
Datasets, enhancing future comparability and reproducibility of bioacoustic
research. A comparative analysis between various transformer models will be
conducted to evaluate their proficiency in bird sound recognition tasks. We aim
to accelerate the progression of avian bioacoustic research and contribute to
more effective conservation strategies.
Related papers
- Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion [93.32354378820648]
We introduce MVSD, a mutual learning framework based on diffusion models.
MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks.
Our framework can improve the performance of the reverberator and dereverberator.
arXiv Detail & Related papers (2024-07-15T00:47:56Z) - animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics [2.1019401515721583]
We present the animal2vec framework, a fully interpretable transformer model and self-supervised training scheme tailored for sparse and unbalanced bioacoustic data.
We openly publish MeerKAT: Meerkat Kalahari Audio Transcripts, a large-scale dataset containing audio collected via biologgers on free-ranging meerkats with a length of over 1068h.
We report new state-of-the-art results on both datasets and evaluate the few-shot capabilities of animal2vec of labeled training data.
arXiv Detail & Related papers (2024-06-03T12:11:01Z) - BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics [2.3066093243272188]
We introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing.
Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.
arXiv Detail & Related papers (2024-03-15T15:10:40Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Few-shot Long-Tailed Bird Audio Recognition [3.8073142980733]
We propose a sound detection and classification pipeline to analyze soundscape recordings.
Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.
arXiv Detail & Related papers (2022-06-22T04:14:25Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy.
Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification.
We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z) - Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge.
We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Modelling Animal Biodiversity Using Acoustic Monitoring and Deep
Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals.
The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP)
Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.