The neural dynamics of auditory word recognition and integration
- URL: http://arxiv.org/abs/2305.13388v2
- Date: Tue, 5 Dec 2023 21:20:16 GMT
- Title: The neural dynamics of auditory word recognition and integration
- Authors: Jon Gauthier and Roger Levy
- Abstract summary: We present a computational model of word recognition which formalizes this perceptual process in Bayesian decision theory.
We fit this model to explain scalp EEG signals recorded as subjects passively listened to a fictional story.
The model reveals distinct neural processing of words depending on whether or not they can be quickly recognized.
- Score: 21.582292050622456
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Listeners recognize and integrate words in rapid and noisy everyday speech by
combining expectations about upcoming content with incremental sensory
evidence. We present a computational model of word recognition which formalizes
this perceptual process in Bayesian decision theory. We fit this model to
explain scalp EEG signals recorded as subjects passively listened to a
fictional story, revealing both the dynamics of the online auditory word
recognition process and the neural correlates of the recognition and
integration of words.
The model reveals distinct neural processing of words depending on whether or
not they can be quickly recognized. While all words trigger a neural response
characteristic of probabilistic integration -- voltage modulations predicted by
a word's surprisal in context -- these modulations are amplified for words
which require more than roughly 150 ms of input to be recognized. We observe no
difference in the latency of these neural responses according to words'
recognition times. Our results are consistent with a two-part model of speech
comprehension, combining an eager and rapid process of word recognition with a
temporally independent process of word integration. However, we also developed
alternative models of the scalp EEG signal not incorporating word recognition
dynamics which showed similar performance improvements. We discuss potential
future modeling steps which may help to separate these hypotheses.
Related papers
- A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech [11.707968216076075]
Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech.
In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech.
Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge.
arXiv Detail & Related papers (2024-05-13T23:36:19Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Self-consistent context aware conformer transducer for speech recognition [0.06008132390640294]
We introduce a novel neural network module that adeptly handles recursive data flow in neural network architectures.
Our method notably improves the accuracy of recognizing rare words without adversely affecting the word error rate for common vocabulary.
Our findings reveal that the combination of both approaches can improve the accuracy of detecting rare words by as much as 4.5 times.
arXiv Detail & Related papers (2024-02-09T18:12:11Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Modelling word learning and recognition using visually grounded speech [18.136170489933082]
Computational models of speech recognition often assume that the set of target words is already given.
This implies that these models do not learn to recognise speech from scratch without prior knowledge and explicit supervision.
Visually grounded speech models learn to recognise speech without prior knowledge by exploiting statistical dependencies between spoken and visual input.
arXiv Detail & Related papers (2022-03-14T08:59:37Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Deep Graph Random Process for Relational-Thinking-Based Speech
Recognition [12.09786458466155]
relational thinking is characterized by relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge.
We present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts.
Our approach is able to successfully infer relations among utterances without using any relational data during training.
arXiv Detail & Related papers (2020-07-04T15:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.