Transferable Models for Bioacoustics with Human Language Supervision
- URL: http://arxiv.org/abs/2308.04978v1
- Date: Wed, 9 Aug 2023 14:22:18 GMT
- Title: Transferable Models for Bioacoustics with Human Language Supervision
- Authors: David Robinson, Adelaide Robinson, Lily Akrapongpisak
- Abstract summary: BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining.
It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Passive acoustic monitoring offers a scalable, non-invasive method for
tracking global biodiversity and anthropogenic impacts on species. Although
deep learning has become a vital tool for processing this data, current models
are inflexible, typically cover only a handful of species, and are limited by
data scarcity. In this work, we propose BioLingual, a new model for
bioacoustics based on contrastive language-audio pretraining. We first
aggregate bioacoustic archives into a language-audio dataset, called
AnimalSpeak, with over a million audio-caption pairs holding information on
species, vocalization context, and animal behavior. After training on this
dataset to connect language and audio representations, our model can identify
over a thousand species' calls across taxa, complete bioacoustic tasks
zero-shot, and retrieve animal vocalization recordings from natural text
queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks
in the Benchmark of Animal Sounds. Given its broad taxa coverage and ability to
be flexibly queried in human language, we believe this model opens new
paradigms in ecological monitoring and research, including free-text search on
the world's acoustic monitoring archives. We open-source our models, dataset,
and code.
Related papers
- Multi Modal Information Fusion of Acoustic and Linguistic Data for Decoding Dairy Cow Vocalizations in Animal Welfare Assessment [0.0]
This study aims to decode dairy cow contact calls by employing multi-modal data fusion techniques.
We utilize the Natural Language Processing model to transcribe audio recordings of cow vocalizations into written form.
We categorized vocalizations into high frequency calls associated with distress or arousal, and low frequency calls linked to contentment or calmness.
arXiv Detail & Related papers (2024-11-01T09:48:30Z) - animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics [2.1019401515721583]
animal2vec is an interpretable large transformer model that learns from unlabeled audio and refines its understanding with labeled data.
Meerkat Audio Transcripts is the largest labeled dataset on non-human terrestrial mammals.
Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset.
arXiv Detail & Related papers (2024-06-03T12:11:01Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Self-Supervised Learning for Few-Shot Bird Sound Classification [10.395255631261458]
Self-supervised learning (SSL) in audio holds significant potential across various domains.
In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
arXiv Detail & Related papers (2023-12-25T22:33:45Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural
Text-to-Speech Synthesis [50.236929707024245]
The SOMOS dataset is the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples.
It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset.
arXiv Detail & Related papers (2022-04-06T18:45:20Z) - Cetacean Translation Initiative: a roadmap to deciphering the
communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species.
We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales.
The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Modelling Animal Biodiversity Using Acoustic Monitoring and Deep
Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals.
The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP)
Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.