Modelling change in neural dynamics during phonetic accommodation
- URL: http://arxiv.org/abs/2502.01210v1
- Date: Mon, 03 Feb 2025 10:00:29 GMT
- Title: Modelling change in neural dynamics during phonetic accommodation
- Authors: Sam Kirkham, Patrycja Strycharczuk, Rob Davies, Danielle Welburn,
- Abstract summary: We advance a computational model of change in phonetic representations during phonetic accommodation.<n>We show vowel-specific degrees of convergence during shadowing, followed by return to baseline post-shadowing.<n>We discuss the implications for the relation between short-term phonetic accommodation and longer-term patterns of sound change.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Short-term phonetic accommodation is a fundamental driver behind accent change, but how does real-time input from another speaker's voice shape the speech planning representations of an interlocutor? We advance a computational model of change in phonetic representations during phonetic accommodation, grounded in dynamic neural field equations for movement planning and memory dynamics. We test the model's ability to capture empirical patterns from an experimental study where speakers shadowed a model talker with a different accent from their own. The experimental data shows vowel-specific degrees of convergence during shadowing, followed by return to baseline (or minor divergence) post-shadowing. The model can reproduce these phenomena by modulating the magnitude of inhibitory memory dynamics, which may reflect resistance to accommodation due to phonological and/or sociolinguistic pressures. We discuss the implications of these results for the relation between short-term phonetic accommodation and longer-term patterns of sound change.
Related papers
- Discovering dynamical laws for speech gestures [0.0]
We discover models in the form of symbolic equations that govern articulatory gestures during speech.
A sparse symbolic regression algorithm is used to discover models from kinematic data on the tongue and lips.
arXiv Detail & Related papers (2025-04-07T09:03:32Z) - Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations.<n>In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z) - A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework [10.354955365036181]
Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems.
This paper presents a novel spectro-temporal relational thinking based acoustic modeling framework.
Models built upon this framework outperform stateof-the-art systems with a 7.82% improvement in phoneme recognition tasks over the TIMIT dataset.
arXiv Detail & Related papers (2024-09-17T05:45:33Z) - Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0 [0.11510009152620666]
We study how Wav2Vec2 resolves phonotactic constraints.
We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts.
Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds.
arXiv Detail & Related papers (2024-07-03T11:04:31Z) - Perception of Phonological Assimilation by Neural Speech Recognition Models [3.4173734484549625]
This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds.
Using psycholinguistic stimuli, we analyze how various linguistic context cues influence compensation patterns in the model's output.
arXiv Detail & Related papers (2024-06-21T15:58:22Z) - A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech [11.707968216076075]
Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech.
In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech.
Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge.
arXiv Detail & Related papers (2024-05-13T23:36:19Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation [62.44907105496227]
MindDial is a novel conversational framework that can generate situated free-form responses with theory-of-mind modeling.
We introduce an explicit mind module that can track the speaker's belief and the speaker's prediction of the listener's belief.
Our framework is applied to both prompting and fine-tuning-based models, and is evaluated across scenarios involving both common ground alignment and negotiation.
arXiv Detail & Related papers (2023-06-27T07:24:32Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - The neural dynamics of auditory word recognition and integration [21.582292050622456]
We present a computational model of word recognition which formalizes this perceptual process in Bayesian decision theory.
We fit this model to explain scalp EEG signals recorded as subjects passively listened to a fictional story.
The model reveals distinct neural processing of words depending on whether or not they can be quickly recognized.
arXiv Detail & Related papers (2023-05-22T18:06:32Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - Neural inhibition during speech planning contributes to contrastive
hyperarticulation [0.17767466724342065]
We present a dynamic neural field (DNF) model of voice onset time (VOT) planning.
We test some predictions of the model with a novel experiment investigating CH of voiceless stop consonant VOT in pseudowords.
The results demonstrate a CH effect in pseudowords, consistent with a basis for the effect in the real-time planning and production of speech.
arXiv Detail & Related papers (2022-09-25T17:54:59Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Repeat after me: Self-supervised learning of acoustic-to-articulatory
mapping by vocal imitation [9.416401293559112]
We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters.
Both forward and inverse models are jointly trained in a self-supervised way from raw acoustic-only speech data from different speakers.
The imitation simulations are evaluated objectively and subjectively and display quite encouraging performances.
arXiv Detail & Related papers (2022-04-05T15:02:49Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis [68.76620947298595]
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text.
We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody.
arXiv Detail & Related papers (2021-06-15T18:03:48Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.