Related papers: Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

URL: http://arxiv.org/abs/2107.09648v1
Date: Tue, 20 Jul 2021 17:33:13 GMT
Title: Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?
Authors: James A. Michaelov, Megan D. Bardolph, Seana Coulson, Benjamin K. Bergen
Abstract summary: transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures. We propose and provide evidence for one possible explanation - their predictions are affected by the preceding context in a way analogous to the effect of semantic facilitation in humans.
Score: 0.5735035463793008
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite being designed for performance rather than cognitive plausibility, transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures, such as recurrent neural networks. Based on how well they predict the N400, a neural signal associated with processing difficulty, we propose and provide evidence for one possible explanation - their predictions are affected by the preceding context in a way analogous to the effect of semantic facilitation in humans.

Related papers

Counterfactual reasoning: an analysis of in-context emergence [49.58529868457226]
Large-scale neural language models (LMs) exhibit remarkable performance in in-context learning.<n>This work studies in-context counterfactual reasoning in language models, that is, to predict the consequences of changes under hypothetical scenarios.
arXiv Detail & Related papers (2025-06-05T16:02:07Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
Meta predictive learning model of languages in neural circuits [2.5690340428649328]
We propose a mean-field learning model within the predictive coding framework. Our model reveals that most of the connections become deterministic after learning. Our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
arXiv Detail & Related papers (2023-09-08T03:58:05Z)
Why can neural language models solve next-word prediction? A mathematical perspective [53.807657273043446]
We study a class of formal languages that can be used to model real-world examples of English sentences. Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.
arXiv Detail & Related papers (2023-06-20T10:41:23Z)
A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection [20.977461918631928]
We study the correlation between human judgments and neural network probabilities for unknown word inflections. We find evidence that the Transformer may be a better account of human behavior than LSTMs.
arXiv Detail & Related papers (2022-10-22T00:59:40Z)
Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z)
So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements [0.6445605125467573]
We investigate whether the linguistic predictions of computational language models or humans better reflect the way in which natural language stimuli modulate the amplitude of the N400. We find that the predictions of three top-of-the-line contemporary language models match the N400 more closely than human predictions. This suggests that the predictive processes underlying the N400 may be more sensitive to the surface-level statistics of language than previously thought.
arXiv Detail & Related papers (2021-09-02T22:00:10Z)
The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain. In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z)
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks. ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z)
Emergent Communication Pretraining for Few-Shot Machine Translation [66.48990742411033]
We pretrain neural networks via emergent communication from referential games. Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages.
arXiv Detail & Related papers (2020-11-02T10:57:53Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
Human Sentence Processing: Recurrence or Attention? [3.834032293147498]
Recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks. We compare Transformer- and RNN-based language models' ability to account for measures of human reading effort.
arXiv Detail & Related papers (2020-05-19T14:17:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.