Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs
and Their Human Owners
- URL: http://arxiv.org/abs/2309.13085v1
- Date: Thu, 21 Sep 2023 23:49:21 GMT
- Title: Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs
and Their Human Owners
- Authors: Jieyi Huang, Chunhao Zhang, Yufei Wang, Mengyue Wu, Kenny Zhu
- Abstract summary: This paper presents a preliminary investigation into the possible correlation between domestic dog vocal expressions and their human host's language environment.
We first present a new dataset of Shiba Inu dog vocals from YouTube, which provides 7500 clean sound clips.
With a classification task and prominent factor analysis, we discover significant acoustic differences in the dog vocals from the two language environments.
- Score: 19.422796780268605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How hosts language influence their pets' vocalization is an interesting yet
underexplored problem. This paper presents a preliminary investigation into the
possible correlation between domestic dog vocal expressions and their human
host's language environment. We first present a new dataset of Shiba Inu dog
vocals from YouTube, which provides 7500 clean sound clips, including their
contextual information of these vocals and their owner's speech clips with a
carefully-designed data processing pipeline. The contextual information
includes the scene category in which the vocal was recorded, the dog's location
and activity. With a classification task and prominent factor analysis, we
discover significant acoustic differences in the dog vocals from the two
language environments. We further identify some acoustic features from dog
vocalizations that are potentially correlated to their host language patterns.
Related papers
- Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification [23.974783158267428]
We explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks.
We show that using speech embedding representations significantly improves over simpler classification baselines.
We also find that models pre-trained on large human speech acoustics can provide additional performance boosts on several tasks.
arXiv Detail & Related papers (2024-04-29T14:41:59Z) - Phonetic and Lexical Discovery of a Canine Language using HuBERT [40.578021131708155]
This paper explores potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers.
We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels.
We develop a web-based dog vocalization labeling system to highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users.
arXiv Detail & Related papers (2024-02-25T04:35:45Z) - Towards Lexical Analysis of Dog Vocalizations via Online Videos [19.422796780268605]
This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.
We first present a new dataset of Shiba Inu sounds, along with contextual information such as location and activity, collected from YouTube.
Based on the analysis of conditioned probability between dog vocalizations and corresponding location and activity, we discover supporting evidence for previous research on the semantic meaning of various dog sounds.
arXiv Detail & Related papers (2023-09-21T23:53:14Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - Language-Guided Audio-Visual Source Separation via Trimodal Consistency [64.0580750128049]
A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform.
We adapt off-the-shelf vision-language foundation models to provide pseudo-target supervision via two novel loss functions.
We demonstrate the effectiveness of our self-supervised approach on three audio-visual separation datasets.
arXiv Detail & Related papers (2023-03-28T22:45:40Z) - Do Orcas Have Semantic Language? Machine Learning to Predict Orca
Behaviors Using Partially Labeled Vocalization Data [50.02992288349178]
We study whether machine learning can predict behavior from vocalizations.
We work with recent recordings of McMurdo Sound orcas.
With careful combination of recent machine learning techniques, we achieve 96.4% classification accuracy.
arXiv Detail & Related papers (2023-01-28T06:04:22Z) - Speak Like a Dog: Human to Non-human creature Voice Conversion [19.703397078178]
H2NH-VC aims to convert human speech into non-human creature-like speech.
To clarify the possibilities and characteristics of the "speak like a dog" task, we conducted a comparative experiment.
The converted voices were evaluated using mean opinion scores: dog-likeness, sound quality and intelligibility, and character error rate (CER)
arXiv Detail & Related papers (2022-06-09T22:10:43Z) - Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition [13.373579620368046]
We have created a VocalSound dataset consisting of over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs.
Experiments show that the vocal sound recognition performance of a model can be significantly improved by 41.9% by adding VocalSound dataset to an existing dataset as training material.
arXiv Detail & Related papers (2022-05-06T18:08:18Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - DeepSinger: Singing Voice Synthesis with Data Mined From the Web [194.10598657846145]
DeepSinger is a multi-lingual singing voice synthesis system built from scratch using singing training data mined from music websites.
We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages.
arXiv Detail & Related papers (2020-07-09T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.