Human Sentence Processing: Recurrence or Attention?
- URL: http://arxiv.org/abs/2005.09471v2
- Date: Tue, 4 May 2021 12:49:14 GMT
- Title: Human Sentence Processing: Recurrence or Attention?
- Authors: Danny Merkx and Stefan L. Frank
- Abstract summary: Recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks.
We compare Transformer- and RNN-based language models' ability to account for measures of human reading effort.
- Score: 3.834032293147498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) have long been an architecture of interest
for computational models of human sentence processing. The recently introduced
Transformer architecture outperforms RNNs on many natural language processing
tasks but little is known about its ability to model human language processing.
We compare Transformer- and RNN-based language models' ability to account for
measures of human reading effort. Our analysis shows Transformers to outperform
RNNs in explaining self-paced reading times and neural activity during reading
English sentences, challenging the widely held idea that human sentence
processing involves recurrent and immediate processing and provides evidence
for cue-based retrieval.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Deep Learning Models to Study Sentence Comprehension in the Human Brain [0.1503974529275767]
Recent artificial neural networks that process natural language achieve unprecedented performance in tasks requiring sentence-level understanding.
We review works that compare these artificial language models with human brain activity and we assess the extent to which this approach has improved our understanding of the neural processes involved in natural language comprehension.
arXiv Detail & Related papers (2023-01-16T10:31:25Z) - Implicit N-grams Induced by Recurrence [10.053475465955794]
We present a study that shows there actually exist some explainable components that reside within the hidden states.
We evaluated such extracted explainable features from trained RNNs on downstream sentiment analysis tasks and found they could be used to model interesting linguistic phenomena.
arXiv Detail & Related papers (2022-05-05T15:53:46Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Dynamic Gesture Recognition [0.0]
It is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms.
The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-09-20T09:45:29Z) - Different kinds of cognitive plausibility: why are transformers better
than RNNs at predicting N400 amplitude? [0.5735035463793008]
transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures.
We propose and provide evidence for one possible explanation - their predictions are affected by the preceding context in a way analogous to the effect of semantic facilitation in humans.
arXiv Detail & Related papers (2021-07-20T17:33:13Z) - Neuroevolution of a Recurrent Neural Network for Spatial and Working
Memory in a Simulated Robotic Environment [57.91534223695695]
We evolved weights in a biologically plausible recurrent neural network (RNN) using an evolutionary algorithm to replicate the behavior and neural activity observed in rats.
Our method demonstrates how the dynamic activity in evolved RNNs can capture interesting and complex cognitive behavior.
arXiv Detail & Related papers (2021-02-25T02:13:52Z) - A Token-wise CNN-based Method for Sentence Compression [31.9210679048841]
Sentence compression is a Natural Language Processing (NLP) task aimed at shortening original sentences and preserving their key information.
Current methods are largely based on Recurrent Neural Network (RNN) models which suffer from poor processing speed.
We propose a token-wise Conal Neural Network, a CNN-based model along with pre-trained Bidirectional Representations from Transformers (BERT) features for deletion-based sentence compression.
arXiv Detail & Related papers (2020-09-23T17:12:06Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.