Perception of Phonological Assimilation by Neural Speech Recognition Models
- URL: http://arxiv.org/abs/2406.15265v1
- Date: Fri, 21 Jun 2024 15:58:22 GMT
- Title: Perception of Phonological Assimilation by Neural Speech Recognition Models
- Authors: Charlotte Pouw, Marianne de Heer Kloots, Afra Alishahi, Willem Zuidema,
- Abstract summary: This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds.
Using psycholinguistic stimuli, we analyze how various linguistic context cues influence compensation patterns in the model's output.
- Score: 3.4173734484549625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human listeners effortlessly compensate for phonological changes during speech perception, often unconsciously inferring the intended sounds. For example, listeners infer the underlying /n/ when hearing an utterance such as "clea[m] pan", where [m] arises from place assimilation to the following labial [p]. This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds, and identifies the linguistic knowledge that is implemented by the model to compensate for assimilation during Automatic Speech Recognition (ASR). Using psycholinguistic stimuli, we systematically analyze how various linguistic context cues influence compensation patterns in the model's output. Complementing these behavioral experiments, our probing experiments indicate that the model shifts its interpretation of assimilated sounds from their acoustic form to their underlying form in its final layers. Finally, our causal intervention experiments suggest that the model relies on minimal phonological context cues to accomplish this shift. These findings represent a step towards better understanding the similarities and differences in phonological processing between neural ASR models and humans.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0 [0.11510009152620666]
We study how Wav2Vec2 resolves phonotactic constraints.
We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts.
Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds.
arXiv Detail & Related papers (2024-07-03T11:04:31Z) - Investigating the Timescales of Language Processing with EEG and Language Models [0.0]
This study explores the temporal dynamics of language processing by examining the alignment between word representations from a pre-trained language model and EEG data.
Using a Temporal Response Function (TRF) model, we investigate how neural activity corresponds to model representations across different layers.
Our analysis reveals patterns in TRFs from distinct layers, highlighting varying contributions to lexical and compositional processing.
arXiv Detail & Related papers (2024-06-28T12:49:27Z) - A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech [11.707968216076075]
Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech.
In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech.
Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge.
arXiv Detail & Related papers (2024-05-13T23:36:19Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - How Familiar Does That Sound? Cross-Lingual Representational Similarity
Analysis of Acoustic Word Embeddings [12.788276426899312]
We present a novel design based on representational similarity analysis (RSA) to analyze acoustic word embeddings (AWEs)
First, we train monolingual AWE models on seven Indo-European languages with various degrees of typological similarity.
We then employ RSA to quantify the cross-lingual similarity by simulating native and non-native spoken-word processing using AWEs.
arXiv Detail & Related papers (2021-09-21T13:51:39Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.