Exploring Sound Change Over Time: A Review of Computational and Human Perception
- URL: http://arxiv.org/abs/2407.05092v1
- Date: Sat, 6 Jul 2024 14:44:59 GMT
- Title: Exploring Sound Change Over Time: A Review of Computational and Human Perception
- Authors: Siqi He, Wei Zhao,
- Abstract summary: We provide a pioneering review contrasting computational with human perception from the perspectives of methods and tasks.
Overall, computational approaches rely on computer-driven models to perceive historical sound changes on etymological datasets.
Human approaches use listener-driven models to perceive ongoing sound changes on recording corpora.
- Score: 2.8908326904081334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computational and human perception are often considered separate approaches for studying sound changes over time; few works have touched on the intersection of both. To fill this research gap, we provide a pioneering review contrasting computational with human perception from the perspectives of methods and tasks. Overall, computational approaches rely on computer-driven models to perceive historical sound changes on etymological datasets, while human approaches use listener-driven models to perceive ongoing sound changes on recording corpora. Despite their differences, both approaches complement each other on phonetic and acoustic levels, showing the potential to achieve a more comprehensive perception of sound change. Moreover, we call for a comparative study on the datasets used by both approaches to investigate the influence of historical sound changes on ongoing changes. Lastly, we discuss the applications of sound change in computational linguistics, and point out that perceiving sound change alone is insufficient, as many processes of language change are complex, with entangled changes at syntactic, semantic, and phonetic levels.
Related papers
- Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation [8.170174172545831]
This paper addresses issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024.
We present an evaluation protocol combining objective metric, namely Fr'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation.
arXiv Detail & Related papers (2024-10-23T06:35:41Z) - Perception of Phonological Assimilation by Neural Speech Recognition Models [3.4173734484549625]
This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds.
Using psycholinguistic stimuli, we analyze how various linguistic context cues influence compensation patterns in the model's output.
arXiv Detail & Related papers (2024-06-21T15:58:22Z) - Evaluating Speaker Identity Coding in Self-supervised Models and Humans [0.42303492200814446]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications.
We show that self-supervised representations from different families are significantly better for speaker identification over acoustic representations.
We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks.
arXiv Detail & Related papers (2024-06-14T20:07:21Z) - Learning to Communicate Functional States with Nonverbal Expressions for Improved Human-Robot Collaboration [3.5408317027307055]
Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction.
We propose a reinforcement learning algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions.
arXiv Detail & Related papers (2024-04-30T04:18:21Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - Sources of Noise in Dialogue and How to Deal with Them [63.02707014103651]
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs.
Despite their prevalence, there currently lacks an accurate survey of dialogue noise.
This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems.
arXiv Detail & Related papers (2022-12-06T04:36:32Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Letters From the Past: Modeling Historical Sound Change Through
Diachronic Character Embeddings [0.0]
We address the detection of sound change through historical spelling.
We propose that a sound change can be captured by comparing the relative distance through time between their distributions using PPMI character embeddings.
We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared.
arXiv Detail & Related papers (2022-05-17T11:57:17Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition.
Our idea is to learn to dereverberate speech from audio-visual observations.
We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.