animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics
- URL: http://arxiv.org/abs/2406.01253v2
- Date: Fri, 26 Jul 2024 07:39:30 GMT
- Title: animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics
- Authors: Julian C. Schäfer-Zimmermann, Vlad Demartsev, Baptiste Averly, Kiran Dhanjal-Adams, Mathieu Duteil, Gabriella Gall, Marius Faiß, Lily Johnson-Ulrich, Dan Stowell, Marta B. Manser, Marie A. Roch, Ariana Strandburg-Peshkin,
- Abstract summary: animal2vec is an interpretable large transformer model that learns from unlabeled audio and refines its understanding with labeled data.
Meerkat Audio Transcripts is the largest labeled dataset on non-human terrestrial mammals.
Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset.
- Score: 2.1019401515721583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bioacoustic research, vital for understanding animal behavior, conservation, and ecology, faces a monumental challenge: analyzing vast datasets where animal vocalizations are rare. While deep learning techniques are becoming standard, adapting them to bioacoustics remains difficult. We address this with animal2vec, an interpretable large transformer model, and a self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. It learns from unlabeled audio and then refines its understanding with labeled data. Furthermore, we introduce and publicly release MeerKAT: Meerkat Kalahari Audio Transcripts, a dataset of meerkat (Suricata suricatta) vocalizations with millisecond-resolution annotations, the largest labeled dataset on non-human terrestrial mammals currently available. Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset. Moreover, animal2vec performs well even with limited labeled data (few-shot learning). animal2vec and MeerKAT provide a new reference point for bioacoustic research, enabling scientists to analyze large amounts of data even with scarce ground truth information.
Related papers
- Multi Modal Information Fusion of Acoustic and Linguistic Data for Decoding Dairy Cow Vocalizations in Animal Welfare Assessment [0.0]
This study aims to decode dairy cow contact calls by employing multi-modal data fusion techniques.
We utilize the Natural Language Processing model to transcribe audio recordings of cow vocalizations into written form.
We categorized vocalizations into high frequency calls associated with distress or arousal, and low frequency calls linked to contentment or calmness.
arXiv Detail & Related papers (2024-11-01T09:48:30Z) - Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data.
To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization.
To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - OmniMotionGPT: Animal Motion Generation with Limited Data [70.35662376853163]
We introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities.
We are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data.
arXiv Detail & Related papers (2023-11-30T07:14:00Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Transferable Models for Bioacoustics with Human Language Supervision [0.0]
BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining.
It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
arXiv Detail & Related papers (2023-08-09T14:22:18Z) - Classification of animal sounds in a hyperdiverse rainforest using
Convolutional Neural Networks [0.0]
Automated species detection from passively recorded soundscapes via machine-learning approaches is a promising technique.
We use soundscapes from a tropical forest in Borneo and a Convolutional Neural Network model (CNN) created with transfer learning.
Our results suggest that transfer learning and data augmentation can make the use of CNNs to classify species' vocalizations feasible even for small soundscape-based projects with many rare species.
arXiv Detail & Related papers (2021-11-29T21:34:57Z) - Cetacean Translation Initiative: a roadmap to deciphering the
communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species.
We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales.
The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z) - AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs
in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet.
The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames.
The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z) - Modelling Animal Biodiversity Using Acoustic Monitoring and Deep
Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals.
The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP)
Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.