Related papers: A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

URL: http://arxiv.org/abs/2310.16142v1
Date: Tue, 24 Oct 2023 19:33:27 GMT
Title: A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing
Authors: William Timkey, Tal Linzen
Abstract summary: We develop a recurrent neural language model with a single self-attention head. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
Score: 25.916625483405802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Two of the central factors believed to underpin human sentence processing difficulty are expectations and retrieval from working memory. A recent attempt to create a unified cognitive model integrating these two factors relied on the parallels between the self-attention mechanism of transformer language models and cue-based retrieval theories of working memory in human sentence processing (Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in specialized attention heads of GPT-2 are consistent with similarity-based interference, a key prediction of cue-based retrieval models, their method requires identifying syntactically specialized attention heads, and makes the cognitively implausible assumption that hundreds of memory retrieval operations take place in parallel. In the present work, we develop a recurrent neural language model with a single self-attention head, which more closely parallels the memory system assumed by cognitive theories. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.

Related papers

Sequence-to-Sequence Models with Attention Mechanistically Map to the Architecture of Human Memory Search [13.961239165301315]
We show that foundational architectures in neural machine translation exhibit mechanisms that directly correspond to those specified in the Context Maintenance and Retrieval model of human memory.<n>We implement a neural machine translation model as a cognitive model of human memory search that is both interpretable and capable of capturing complex dynamics of learning.
arXiv Detail & Related papers (2025-06-20T18:43:15Z)
Quantifying Cross-Modality Memorization in Vision-Language Models [86.82366725590508]
We study the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models.<n>Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities.
arXiv Detail & Related papers (2025-06-05T16:10:47Z)
If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation? [3.757103053174534]
We investigate whether the attention mechanism of Transformer Grammar (TG) can serve as a cognitive model of human memory retrieval. Our experiments demonstrate that TG's attention achieves superior predictive power for self-paced reading times compared to vanilla Transformer's.
arXiv Detail & Related papers (2025-02-17T05:58:25Z)
Neuron-Level Differentiation of Memorization and Generalization in Large Language Models [9.504942958632384]
We investigate how Large Language Models distinguish between memorization and generalization at the neuron level.<n>Experiments on both a GPT-2 model trained from scratch and a pretrained LLaMA-3.2 model fine-tuned with LoRA show consistent neuron-level specialization.
arXiv Detail & Related papers (2024-12-24T15:28:56Z)
Counterfactual Generation from Language Models [64.55296662926919]
We show that counterfactual reasoning is conceptually distinct from interventions. We propose a framework for generating true string counterfactuals. Our experiments demonstrate that the approach produces meaningful counterfactuals.
arXiv Detail & Related papers (2024-11-11T17:57:30Z)
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction [60.964512894143475]
We present Generative Spatial Transformer ( GST), a novel auto-regressive framework that jointly addresses spatial localization and view prediction. Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
arXiv Detail & Related papers (2024-10-24T17:58:05Z)
CauSkelNet: Causal Representation Learning for Human Behaviour Analysis [6.880536510094897]
This study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors. Our approach advances human motion analysis and paves the way for more adaptive intelligent healthcare solutions.
arXiv Detail & Related papers (2024-09-23T21:38:49Z)
Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning [28.681229869236393]
longitudinal neuroimaging studies aim to improve the understanding of brain aging and diseases by studying the dynamic interactions between brain function and cognition. We propose an unsupervised learning model that encodes their relationship via Graph Attention Networks and generalized Correlational Analysis. To create brain-cognition fingerprints reflecting unique neural and cognitive phenotype of each person, the model also relies on individualized and multimodal contrastive learning.
arXiv Detail & Related papers (2024-09-20T20:36:20Z)
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework [10.354955365036181]
Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems. This paper presents a novel spectro-temporal relational thinking based acoustic modeling framework. Models built upon this framework outperform stateof-the-art systems with a 7.82% improvement in phoneme recognition tasks over the TIMIT dataset.
arXiv Detail & Related papers (2024-09-17T05:45:33Z)
Linking In-context Learning in Transformers to Human Episodic Memory [1.124958340749622]
We focus on induction heads, which contribute to in-context learning in Transformer-based large language models. We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
arXiv Detail & Related papers (2024-05-23T18:51:47Z)
Towards a Psychology of Machines: Large Language Models Predict Human Memory [0.0]
Large language models (LLMs) have shown remarkable abilities in natural language processing. This study explores whether LLMs can predict human memory performance in tasks involving garden-path sentences and contextual information.
arXiv Detail & Related papers (2024-03-08T08:41:14Z)
Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency. The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples. The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner. We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z)
CogAlign: Learning to Align Textual Neural Representations to Cognitive Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models. We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z)
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention [4.103438743479001]
We advance an explanation of similarity-based interference effects in subject-verb and reflexive pronoun agreement processing. We show that surprisal of the verb or reflexive pronoun predicts facilitatory interference effects in ungrammatical sentences.
arXiv Detail & Related papers (2021-04-26T20:46:54Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
Towards a Neural Model for Serial Order in Frontal Cortex: a Brain Theory from Memory Development to Higher-Level Cognition [53.816853325427424]
We propose that the immature prefrontal cortex (PFC) use its primary functionality of detecting hierarchical patterns in temporal signals. Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the form of ordinal patterns and use them to index information hierarchically in different parts of the brain. By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and planning temporally ordered information.
arXiv Detail & Related papers (2020-05-22T14:29:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.