Related papers: Neural inhibition during speech planning contributes to contrastive hyperarticulation

Neural inhibition during speech planning contributes to contrastive hyperarticulation

URL: http://arxiv.org/abs/2209.12278v1
Date: Sun, 25 Sep 2022 17:54:59 GMT
Title: Neural inhibition during speech planning contributes to contrastive hyperarticulation
Authors: Michael C. Stern and Jason A. Shaw
Abstract summary: We present a dynamic neural field (DNF) model of voice onset time (VOT) planning. We test some predictions of the model with a novel experiment investigating CH of voiceless stop consonant VOT in pseudowords. The results demonstrate a CH effect in pseudowords, consistent with a basis for the effect in the real-time planning and production of speech.
Score: 0.17767466724342065
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Previous work has demonstrated that words are hyperarticulated on dimensions of speech that differentiate them from a minimal pair competitor. This phenomenon has been termed contrastive hyperarticulation (CH). We present a dynamic neural field (DNF) model of voice onset time (VOT) planning that derives CH from an inhibitory influence of the minimal pair competitor during planning. We test some predictions of the model with a novel experiment investigating CH of voiceless stop consonant VOT in pseudowords. The results demonstrate a CH effect in pseudowords, consistent with a basis for the effect in the real-time planning and production of speech. The scope and magnitude of CH in pseudowords was reduced compared to CH in real words, consistent with a role for interactive activation between lexical and phonological levels of planning. We discuss the potential of our model to unify an apparently disparate set of phenomena, from CH to phonological neighborhood effects to phonetic trace effects in speech errors.

Related papers

Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting [4.342241136871849]
This work evaluates anisotropy in keyword spotting for computational documentary linguistics.<n>We show that despite anisotropy, wav2vec2 similarity measures effectively identify words without transcription.
arXiv Detail & Related papers (2025-06-06T08:52:56Z)
NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction [59.44357187878676]
We introduce a novel generative modeling paradigm, Next-Token-Pair Prediction (NTPP), to enable speaker-independent dual-channel spoken dialogue learning.<n>We evaluate our approach on standard benchmarks, and empirical results show that our proposed method, NTPP, significantly improves the conversational abilities of SLMs in terms of turn-taking prediction, response coherence, and naturalness.
arXiv Detail & Related papers (2025-06-01T12:01:40Z)
Modelling change in neural dynamics during phonetic accommodation [0.0]
We advance a computational model of change in phonetic representations during phonetic accommodation. We show vowel-specific degrees of convergence during shadowing, followed by return to baseline post-shadowing. We discuss the implications for the relation between short-term phonetic accommodation and longer-term patterns of sound change.
arXiv Detail & Related papers (2025-02-03T10:00:29Z)
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models [55.898594710420326]
We propose a novel spontaneous speech synthesis system based on language models. Fine-grained prosody modeling is introduced to enhance the model's ability to capture subtle prosody variations in spontaneous speech.
arXiv Detail & Related papers (2024-07-18T13:42:38Z)
Investigating the Timescales of Language Processing with EEG and Language Models [0.0]
This study explores the temporal dynamics of language processing by examining the alignment between word representations from a pre-trained language model and EEG data. Using a Temporal Response Function (TRF) model, we investigate how neural activity corresponds to model representations across different layers. Our analysis reveals patterns in TRFs from distinct layers, highlighting varying contributions to lexical and compositional processing.
arXiv Detail & Related papers (2024-06-28T12:49:27Z)
Perception of Phonological Assimilation by Neural Speech Recognition Models [3.4173734484549625]
This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds. Using psycholinguistic stimuli, we analyze how various linguistic context cues influence compensation patterns in the model's output.
arXiv Detail & Related papers (2024-06-21T15:58:22Z)
Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation [6.225927189801006]
We propose a novel framework that incorporates comprehensive modeling of both syntactic and acoustic cues that are associated with pausing patterns. Remarkably, our framework possesses the capability to consistently generate natural speech even for considerably more extended and intricate out-of-domain (OOD) sentences.
arXiv Detail & Related papers (2024-04-03T09:17:38Z)
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations. Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z)
CausalDialogue: Modeling Utterance-level Causality in Conversations [83.03604651485327]
We have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. We propose a causality-enhanced method called Exponential Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models.
arXiv Detail & Related papers (2022-12-20T18:31:50Z)
Applying Syntax$\unicode{x2013}$Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis [7.609330016848916]
End-to-end text-to-speech (TTS) generates speech sounds directly from strings of texts or phonemes. This study investigates whether it can reproduce rhythmic linguistics based on phonological constraints. A proposed model efficiently synthesizes phonological phenomena in the test data that were not explicitly included in the training data.
arXiv Detail & Related papers (2022-03-29T06:45:28Z)
Conversational speech recognition leveraging effective fusion methods for cross-utterance language modeling [12.153618111267514]
We put forward disparate conversation history fusion methods for language modeling in automatic speech recognition. A novel audio-fusion mechanism is introduced, which manages to fuse and utilize the acoustic embeddings of a current utterance and the semantic content of its corresponding conversation history. To flesh out our ideas, we frame the ASR N-best hypothesis rescoring task as a prediction problem, leveraging BERT, an iconic pre-trained LM.
arXiv Detail & Related papers (2021-11-05T09:07:23Z)
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models. We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z)
Towards Modelling Coherence in Spoken Discourse [48.80477600384429]
Coherence in spoken discourse is dependent on the prosodic and acoustic patterns in speech. We model coherence in spoken discourse with audio-based coherence models.
arXiv Detail & Related papers (2020-12-31T20:18:29Z)
On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.