Do Orcas Have Semantic Language? Machine Learning to Predict Orca
Behaviors Using Partially Labeled Vocalization Data
- URL: http://arxiv.org/abs/2302.10983v1
- Date: Sat, 28 Jan 2023 06:04:22 GMT
- Title: Do Orcas Have Semantic Language? Machine Learning to Predict Orca
Behaviors Using Partially Labeled Vocalization Data
- Authors: Sophia Sandholm
- Abstract summary: We study whether machine learning can predict behavior from vocalizations.
We work with recent recordings of McMurdo Sound orcas.
With careful combination of recent machine learning techniques, we achieve 96.4% classification accuracy.
- Score: 50.02992288349178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Orcinus orca (killer whales) exhibit complex calls. They last about a second.
In a call, an orca typically uses multiple frequencies simultaneously, varies
the frequencies, and varies their volumes. Behavior data is hard to obtain
because orcas live under water and travel quickly. Sound data is relatively
easy to capture. As a science goal, we would like to know whether orca
vocalizations constitute a semantic language. We do this by studying whether
machine learning can predict behavior from vocalizations. Such prediction would
also help scientific research and safety applications because one would like to
predict behavior while only having to capture sound. A significant challenge in
this process is lack of labeled data. We work with recent recordings of McMurdo
Sound orcas [Wellard et al. 2020] where each recording is labeled with the
behaviors observed during the recording. This yields a dataset where sound
segments - continuous vocalizations that can be thought of as call sequences or
more general structures - within the recordings are labeled with superfluous
behaviors. Despite that, with a careful combination of recent machine learning
techniques, we achieve 96.4% classification accuracy. This suggests that orcas
do use a semantic language. It is also promising for research and applications.
Related papers
- animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics [2.1019401515721583]
animal2vec is an interpretable large transformer model that learns from unlabeled audio and refines its understanding with labeled data.
Meerkat Audio Transcripts is the largest labeled dataset on non-human terrestrial mammals.
Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset.
arXiv Detail & Related papers (2024-06-03T12:11:01Z) - Towards Lexical Analysis of Dog Vocalizations via Online Videos [19.422796780268605]
This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.
We first present a new dataset of Shiba Inu sounds, along with contextual information such as location and activity, collected from YouTube.
Based on the analysis of conditioned probability between dog vocalizations and corresponding location and activity, we discover supporting evidence for previous research on the semantic meaning of various dog sounds.
arXiv Detail & Related papers (2023-09-21T23:53:14Z) - Transferable Models for Bioacoustics with Human Language Supervision [0.0]
BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining.
It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
arXiv Detail & Related papers (2023-08-09T14:22:18Z) - WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research [82.42802570171096]
We introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.
Online-harvested raw descriptions are highly noisy and unsuitable for direct use in tasks such as automated audio captioning.
We propose a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT, a large language model, is leveraged to filter and transform raw descriptions automatically.
arXiv Detail & Related papers (2023-03-30T14:07:47Z) - DeepFry: Identifying Vocal Fry Using Deep Neural Networks [16.489251286870704]
Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch.
Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems.
This paper proposes a deep learning model to detect creaky voice in fluent speech.
arXiv Detail & Related papers (2022-03-31T13:23:24Z) - Cetacean Translation Initiative: a roadmap to deciphering the
communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species.
We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales.
The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z) - Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic
Speech Synthesis [59.623780036359655]
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators.
This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury.
We propose a solution to this problem based on the theory of multi-view learning.
arXiv Detail & Related papers (2020-12-30T15:09:02Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.