A Language Model with Limited Memory Capacity Captures Interference in
Human Sentence Processing
- URL: http://arxiv.org/abs/2310.16142v1
- Date: Tue, 24 Oct 2023 19:33:27 GMT
- Title: A Language Model with Limited Memory Capacity Captures Interference in
Human Sentence Processing
- Authors: William Timkey, Tal Linzen
- Abstract summary: We develop a recurrent neural language model with a single self-attention head.
We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
- Score: 25.916625483405802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two of the central factors believed to underpin human sentence processing
difficulty are expectations and retrieval from working memory. A recent attempt
to create a unified cognitive model integrating these two factors relied on the
parallels between the self-attention mechanism of transformer language models
and cue-based retrieval theories of working memory in human sentence processing
(Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in
specialized attention heads of GPT-2 are consistent with similarity-based
interference, a key prediction of cue-based retrieval models, their method
requires identifying syntactically specialized attention heads, and makes the
cognitively implausible assumption that hundreds of memory retrieval operations
take place in parallel. In the present work, we develop a recurrent neural
language model with a single self-attention head, which more closely parallels
the memory system assumed by cognitive theories. We show that our model's
single attention head captures semantic and syntactic interference effects
observed in human experiments.
Related papers
- Linking In-context Learning in Transformers to Human Episodic Memory [1.124958340749622]
We focus on the induction heads, which contribute to the in-context learning capabilities of Transformer-based large language models.
We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval model of human episodic memory.
arXiv Detail & Related papers (2024-05-23T18:51:47Z) - Memory, Space, and Planning: Multiscale Predictive Representations [5.572701755354684]
Flexible behavior in biological and artificial agents depends on the interplay of learning from the past and predicting the future in ever-changing environments.
This chapter reviews computational, behavioral, and neural evidence suggesting these processes rely on learning the structure of experiences, known as cognitive maps.
arXiv Detail & Related papers (2024-01-16T21:46:43Z) - Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism
of Language Models [49.39276272693035]
Large-scale pre-trained language models have shown remarkable memorizing ability.
Vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem.
We find that 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation.
arXiv Detail & Related papers (2023-05-16T03:50:38Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - From internal models toward metacognitive AI [0.0]
In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" orchestrates conscious involvement of generative-inverse model pairs.
A high responsibility signal is given to the pairs that best capture the external world.
consciousness is determined by the entropy of responsibility signals across all pairs.
arXiv Detail & Related papers (2021-09-27T05:00:56Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - Accounting for Agreement Phenomena in Sentence Comprehension with
Transformer Language Models: Effects of Similarity-based Interference on
Surprisal and Attention [4.103438743479001]
We advance an explanation of similarity-based interference effects in subject-verb and reflexive pronoun agreement processing.
We show that surprisal of the verb or reflexive pronoun predicts facilitatory interference effects in ungrammatical sentences.
arXiv Detail & Related papers (2021-04-26T20:46:54Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - A Meta-Bayesian Model of Intentional Visual Search [0.0]
We propose a computational model of visual search that incorporates Bayesian interpretations of the neural mechanisms that underlie categorical perception and saccade planning.
To enable meaningful comparisons between simulated and human behaviours, we employ a gaze-contingent paradigm that required participants to classify occluded MNIST digits through a window that followed their gaze.
Our model is able to recapitulate human behavioural metrics such as classification accuracy while retaining a high degree of interpretability, which we demonstrate by recovering subject-specific parameters from observed human behaviour.
arXiv Detail & Related papers (2020-06-05T16:10:35Z) - Towards a Neural Model for Serial Order in Frontal Cortex: a Brain
Theory from Memory Development to Higher-Level Cognition [53.816853325427424]
We propose that the immature prefrontal cortex (PFC) use its primary functionality of detecting hierarchical patterns in temporal signals.
Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the form of ordinal patterns and use them to index information hierarchically in different parts of the brain.
By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and planning temporally ordered information.
arXiv Detail & Related papers (2020-05-22T14:29:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.