Language models and brain alignment: beyond word-level semantics and
prediction
- URL: http://arxiv.org/abs/2212.00596v1
- Date: Thu, 1 Dec 2022 15:48:51 GMT
- Title: Language models and brain alignment: beyond word-level semantics and
prediction
- Authors: Gabriele Merlin and Mariya Toneva
- Abstract summary: Recent works suggest that the prediction of the next word is a key mechanism that contributes to the alignment between the two.
We take a first step towards a better understanding via two simple perturbations in a popular pretrained language model.
- Score: 5.678337324555035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models that have been trained to predict the next word
over billions of text documents have been shown to also significantly predict
brain recordings of people comprehending language. Understanding the reasons
behind the observed similarities between language in machines and language in
the brain can lead to more insight into both systems. Recent works suggest that
the prediction of the next word is a key mechanism that contributes to the
alignment between the two. What is not yet understood is whether prediction of
the next word is necessary for this observed alignment or simply sufficient,
and whether there are other shared mechanisms or information that is similarly
important. In this work, we take a first step towards a better understanding
via two simple perturbations in a popular pretrained language model. The first
perturbation is to improve the model's ability to predict the next word in the
specific naturalistic stimulus text that the brain recordings correspond to. We
show that this indeed improves the alignment with the brain recordings.
However, this improved alignment may also be due to any improved word-level or
multi-word level semantics for the specific world that is described by the
stimulus narrative. We aim to disentangle the contribution of next word
prediction and semantic knowledge via our second perturbation: scrambling the
word order at inference time, which reduces the ability to predict the next
word, but maintains any newly learned word-level semantics. By comparing the
alignment with brain recordings of these differently perturbed models, we show
that improvements in alignment with brain recordings are due to more than
improvements in next word prediction and word-level semantics.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Causal Graph in Language Model Rediscovers Cortical Hierarchy in Human
Narrative Processing [0.0]
Previous studies have demonstrated that the features of language models can be mapped to fMRI brain activity.
This raises the question: is there a commonality between information processing in language models and the human brain?
To estimate information flow patterns in a language model, we examined the causal relationships between different layers.
arXiv Detail & Related papers (2023-11-17T10:09:12Z) - Code-Switching with Word Senses for Pretraining in Neural Machine
Translation [107.23743153715799]
We introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT)
WSP-NMT is an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases.
Our experiments show significant improvements in overall translation quality.
arXiv Detail & Related papers (2023-10-21T16:13:01Z) - Humans and language models diverge when predicting repeating text [52.03471802608112]
We present a scenario in which the performance of humans and LMs diverges.
Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory begins to play a role.
We hope that this scenario will spur future work in bringing LMs closer to human behavior.
arXiv Detail & Related papers (2023-10-10T08:24:28Z) - Why can neural language models solve next-word prediction? A
mathematical perspective [53.807657273043446]
We study a class of formal languages that can be used to model real-world examples of English sentences.
Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.
arXiv Detail & Related papers (2023-06-20T10:41:23Z) - Word class representations spontaneously emerge in a deep neural network
trained on next word prediction [7.240611820374677]
How do humans learn language, and can the first language be learned at all?
These fundamental questions are still hotly debated.
In particular, we train an artificial deep neural network on predicting the next word.
We find that the internal representations of nine-word input sequences cluster according to the word class of the tenth word to be predicted as output.
arXiv Detail & Related papers (2023-02-15T11:02:50Z) - Collateral facilitation in humans and language models [0.6091702876917281]
We show that humans display a similar processing advantage for highly anomalous words.
We discuss the implications for our understanding of both human language comprehension and the predictions made by language models.
arXiv Detail & Related papers (2022-11-09T21:08:08Z) - Long-range and hierarchical language predictions in brains and
algorithms [82.81964713263483]
We show that while deep language algorithms are optimized to predict adjacent words, the human brain would be tuned to make long-range and hierarchical predictions.
This study strengthens predictive coding theory and suggests a critical role of long-range and hierarchical predictions in natural language processing.
arXiv Detail & Related papers (2021-11-28T20:26:07Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.