Phonetic and Lexical Discovery of a Canine Language using HuBERT
- URL: http://arxiv.org/abs/2402.15985v1
- Date: Sun, 25 Feb 2024 04:35:45 GMT
- Title: Phonetic and Lexical Discovery of a Canine Language using HuBERT
- Authors: Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
- Abstract summary: This paper explores potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers.
We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels.
We develop a web-based dog vocalization labeling system to highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users.
- Score: 40.578021131708155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper delves into the pioneering exploration of potential communication
patterns within dog vocalizations and transcends traditional linguistic
analysis barriers, which heavily relies on human priori knowledge on limited
datasets to find sound units in dog vocalization. We present a self-supervised
approach with HuBERT, enabling the accurate classification of phoneme labels
and the identification of vocal patterns that suggest a rudimentary vocabulary
within dog vocalizations. Our findings indicate a significant acoustic
consistency in these identified canine vocabulary, covering the entirety of
observed dog vocalization sequences. We further develop a web-based dog
vocalization labeling system. This system can highlight phoneme n-grams,
present in the vocabulary, in the dog audio uploaded by users.
Related papers
- Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles [47.61526125774749]
A dog whistle is a form of coded communication that carries a secondary meaning to specific audiences and is often weaponized for racial and socioeconomic discrimination.
We present an approach for word-sense disambiguation of dog whistles from standard speech using Large Language Models (LLMs)
We leverage this technique to create a dataset of 16,550 high-confidence coded examples of dog whistles used in formal and informal communication.
arXiv Detail & Related papers (2024-06-10T23:09:19Z) - Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification [23.974783158267428]
We explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks.
We show that using speech embedding representations significantly improves over simpler classification baselines.
We also find that models pre-trained on large human speech acoustics can provide additional performance boosts on several tasks.
arXiv Detail & Related papers (2024-04-29T14:41:59Z) - ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds [6.751004034983776]
We introduce ISPA (Inter-Species Phonetic Alphabet), a precise, concise, and interpretable system for transcribing animal sounds into text.
We show that established human language ML paradigms and models, such as language models, can be successfully applied to improve performance.
arXiv Detail & Related papers (2024-02-05T18:27:27Z) - Towards Lexical Analysis of Dog Vocalizations via Online Videos [19.422796780268605]
This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.
We first present a new dataset of Shiba Inu sounds, along with contextual information such as location and activity, collected from YouTube.
Based on the analysis of conditioned probability between dog vocalizations and corresponding location and activity, we discover supporting evidence for previous research on the semantic meaning of various dog sounds.
arXiv Detail & Related papers (2023-09-21T23:53:14Z) - Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs
and Their Human Owners [19.422796780268605]
This paper presents a preliminary investigation into the possible correlation between domestic dog vocal expressions and their human host's language environment.
We first present a new dataset of Shiba Inu dog vocals from YouTube, which provides 7500 clean sound clips.
With a classification task and prominent factor analysis, we discover significant acoustic differences in the dog vocals from the two language environments.
arXiv Detail & Related papers (2023-09-21T23:49:21Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - JukeBox: A Multilingual Singer Recognition Dataset [17.33151600403503]
textitJukeBox is a speaker recognition dataset with multilingual singing voice audio annotated with singer identity, gender, and language labels.
We use the current state-of-the-art methods to demonstrate the difficulty of performing speaker recognition on singing voice using models trained on spoken voice alone.
arXiv Detail & Related papers (2020-08-08T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.