BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics
- URL: http://arxiv.org/abs/2403.10380v3
- Date: Mon, 17 Jun 2024 15:25:11 GMT
- Title: BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics
- Authors: Lukas Rauch, Raphael Schwinger, Moritz Wirth, René Heinrich, Denis Huseljic, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz,
- Abstract summary: We introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing.
Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.
- Score: 2.3066093243272188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), DL models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a comprehensive evaluation of generalization performance. Therefore, we introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing. Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.
Related papers
- Towards Deep Active Learning in Avian Bioacoustics [1.7522552085069194]
Active learning (AL) reduces annotation cost and speed up adaptions to diverse scenarios by querying the most informative instances for labeling.
This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.
arXiv Detail & Related papers (2024-06-26T08:43:05Z) - Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data [1.3506669466260703]
The framework automatically extracted labeled data from available platforms for selected avian species.
The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models.
The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions.
arXiv Detail & Related papers (2024-06-19T14:14:24Z) - animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics [2.1019401515721583]
We present the animal2vec framework, a fully interpretable transformer model and self-supervised training scheme tailored for sparse and unbalanced bioacoustic data.
We openly publish MeerKAT: Meerkat Kalahari Audio Transcripts, a large-scale dataset containing audio collected via biologgers on free-ranging meerkats with a length of over 1068h.
We report new state-of-the-art results on both datasets and evaluate the few-shot capabilities of animal2vec of labeled training data.
arXiv Detail & Related papers (2024-06-03T12:11:01Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Silkie: Preference Distillation for Large Visual Language Models [56.10697821410489]
This paper explores preference distillation for large vision language models (LVLMs)
We first build a vision-language feedback dataset utilizing AI annotation.
We adopt GPT-4V to assess the generated outputs regarding helpfulness, visual faithfulness, and ethical considerations.
The resulting model Silkie, achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities.
arXiv Detail & Related papers (2023-12-17T09:44:27Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with
Transformers [2.404305970432934]
We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL)
We aim to bypass traditional spectrogram conversions, enabling direct raw audio processing.
arXiv Detail & Related papers (2023-08-14T13:06:10Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge.
We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.