Modelling change in neural dynamics during phonetic accommodation
- URL: http://arxiv.org/abs/2502.01210v1
- Date: Mon, 03 Feb 2025 10:00:29 GMT
- Title: Modelling change in neural dynamics during phonetic accommodation
- Authors: Sam Kirkham, Patrycja Strycharczuk, Rob Davies, Danielle Welburn,
- Abstract summary: We advance a computational model of change in phonetic representations during phonetic accommodation.
We show vowel-specific degrees of convergence during shadowing, followed by return to baseline post-shadowing.
We discuss the implications for the relation between short-term phonetic accommodation and longer-term patterns of sound change.
- Score: 0.0
- License:
- Abstract: Short-term phonetic accommodation is a fundamental driver behind accent change, but how does real-time input from another speaker's voice shape the speech planning representations of an interlocutor? We advance a computational model of change in phonetic representations during phonetic accommodation, grounded in dynamic neural field equations for movement planning and memory dynamics. We test the model's ability to capture empirical patterns from an experimental study where speakers shadowed a model talker with a different accent from their own. The experimental data shows vowel-specific degrees of convergence during shadowing, followed by return to baseline (or minor divergence) post-shadowing. The model can reproduce these phenomena by modulating the magnitude of inhibitory memory dynamics, which may reflect resistance to accommodation due to phonological and/or sociolinguistic pressures. We discuss the implications of these results for the relation between short-term phonetic accommodation and longer-term patterns of sound change.
Related papers
- A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework [10.354955365036181]
Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems.
This paper presents a novel spectro-temporal relational thinking based acoustic modeling framework.
Models built upon this framework outperform stateof-the-art systems with a 7.82% improvement in phoneme recognition tasks over the TIMIT dataset.
arXiv Detail & Related papers (2024-09-17T05:45:33Z) - Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0 [0.11510009152620666]
We study how Wav2Vec2 resolves phonotactic constraints.
We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts.
Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds.
arXiv Detail & Related papers (2024-07-03T11:04:31Z) - Perception of Phonological Assimilation by Neural Speech Recognition Models [3.4173734484549625]
This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds.
Using psycholinguistic stimuli, we analyze how various linguistic context cues influence compensation patterns in the model's output.
arXiv Detail & Related papers (2024-06-21T15:58:22Z) - A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech [11.707968216076075]
Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech.
In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech.
Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge.
arXiv Detail & Related papers (2024-05-13T23:36:19Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Repeat after me: Self-supervised learning of acoustic-to-articulatory
mapping by vocal imitation [9.416401293559112]
We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters.
Both forward and inverse models are jointly trained in a self-supervised way from raw acoustic-only speech data from different speakers.
The imitation simulations are evaluated objectively and subjectively and display quite encouraging performances.
arXiv Detail & Related papers (2022-04-05T15:02:49Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis [68.76620947298595]
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text.
We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody.
arXiv Detail & Related papers (2021-06-15T18:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.