A Language Model with Limited Memory Capacity Captures Interference in
Human Sentence Processing
- URL: http://arxiv.org/abs/2310.16142v1
- Date: Tue, 24 Oct 2023 19:33:27 GMT
- Title: A Language Model with Limited Memory Capacity Captures Interference in
Human Sentence Processing
- Authors: William Timkey, Tal Linzen
- Abstract summary: We develop a recurrent neural language model with a single self-attention head.
We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
- Score: 25.916625483405802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two of the central factors believed to underpin human sentence processing
difficulty are expectations and retrieval from working memory. A recent attempt
to create a unified cognitive model integrating these two factors relied on the
parallels between the self-attention mechanism of transformer language models
and cue-based retrieval theories of working memory in human sentence processing
(Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in
specialized attention heads of GPT-2 are consistent with similarity-based
interference, a key prediction of cue-based retrieval models, their method
requires identifying syntactically specialized attention heads, and makes the
cognitively implausible assumption that hundreds of memory retrieval operations
take place in parallel. In the present work, we develop a recurrent neural
language model with a single self-attention head, which more closely parallels
the memory system assumed by cognitive theories. We show that our model's
single attention head captures semantic and syntactic interference effects
observed in human experiments.
Related papers
- Counterfactual Generation from Language Models [64.55296662926919]
We show that counterfactual reasoning is conceptually distinct from interventions.
We propose a framework for generating true string counterfactuals.
Our experiments demonstrate that the approach produces meaningful counterfactuals.
arXiv Detail & Related papers (2024-11-11T17:57:30Z) - Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction [60.964512894143475]
We present Generative Spatial Transformer ( GST), a novel auto-regressive framework that jointly addresses spatial localization and view prediction.
Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
arXiv Detail & Related papers (2024-10-24T17:58:05Z) - CauSkelNet: Causal Representation Learning for Human Behaviour Analysis [6.880536510094897]
This study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors.
Our approach advances human motion analysis and paves the way for more adaptive intelligent healthcare solutions.
arXiv Detail & Related papers (2024-09-23T21:38:49Z) - Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning [28.681229869236393]
longitudinal neuroimaging studies aim to improve the understanding of brain aging and diseases by studying the dynamic interactions between brain function and cognition.
We propose an unsupervised learning model that encodes their relationship via Graph Attention Networks and generalized Correlational Analysis.
To create brain-cognition fingerprints reflecting unique neural and cognitive phenotype of each person, the model also relies on individualized and multimodal contrastive learning.
arXiv Detail & Related papers (2024-09-20T20:36:20Z) - A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework [10.354955365036181]
Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems.
This paper presents a novel spectro-temporal relational thinking based acoustic modeling framework.
Models built upon this framework outperform stateof-the-art systems with a 7.82% improvement in phoneme recognition tasks over the TIMIT dataset.
arXiv Detail & Related papers (2024-09-17T05:45:33Z) - Linking In-context Learning in Transformers to Human Episodic Memory [1.124958340749622]
We focus on induction heads, which contribute to in-context learning in Transformer-based large language models.
We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
arXiv Detail & Related papers (2024-05-23T18:51:47Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - Accounting for Agreement Phenomena in Sentence Comprehension with
Transformer Language Models: Effects of Similarity-based Interference on
Surprisal and Attention [4.103438743479001]
We advance an explanation of similarity-based interference effects in subject-verb and reflexive pronoun agreement processing.
We show that surprisal of the verb or reflexive pronoun predicts facilitatory interference effects in ungrammatical sentences.
arXiv Detail & Related papers (2021-04-26T20:46:54Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Towards a Neural Model for Serial Order in Frontal Cortex: a Brain
Theory from Memory Development to Higher-Level Cognition [53.816853325427424]
We propose that the immature prefrontal cortex (PFC) use its primary functionality of detecting hierarchical patterns in temporal signals.
Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the form of ordinal patterns and use them to index information hierarchically in different parts of the brain.
By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and planning temporally ordered information.
arXiv Detail & Related papers (2020-05-22T14:29:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.