Related papers: Language models and brain alignment: beyond word-level semantics and prediction

Language models and brain alignment: beyond word-level semantics and prediction

URL: http://arxiv.org/abs/2212.00596v1
Date: Thu, 1 Dec 2022 15:48:51 GMT
Title: Language models and brain alignment: beyond word-level semantics and prediction
Authors: Gabriele Merlin and Mariya Toneva
Abstract summary: Recent works suggest that the prediction of the next word is a key mechanism that contributes to the alignment between the two. We take a first step towards a better understanding via two simple perturbations in a popular pretrained language model.
Score: 5.678337324555035
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretrained language models that have been trained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language. Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. Recent works suggest that the prediction of the next word is a key mechanism that contributes to the alignment between the two. What is not yet understood is whether prediction of the next word is necessary for this observed alignment or simply sufficient, and whether there are other shared mechanisms or information that is similarly important. In this work, we take a first step towards a better understanding via two simple perturbations in a popular pretrained language model. The first perturbation is to improve the model's ability to predict the next word in the specific naturalistic stimulus text that the brain recordings correspond to. We show that this indeed improves the alignment with the brain recordings. However, this improved alignment may also be due to any improved word-level or multi-word level semantics for the specific world that is described by the stimulus narrative. We aim to disentangle the contribution of next word prediction and semantic knowledge via our second perturbation: scrambling the word order at inference time, which reduces the ability to predict the next word, but maintains any newly learned word-level semantics. By comparing the alignment with brain recordings of these differently perturbed models, we show that improvements in alignment with brain recordings are due to more than improvements in next word prediction and word-level semantics.

Related papers

Explanations of Large Language Models Explain Language Representations in the Brain [5.7916055414970895]
We propose a novel approach using explainable AI (XAI) to strengthen link between language processing and brain neural activity. Applying attribution methods, we quantify the influence of preceding words on predictions. We find stronger attributions suggest brain alignment for assessing the biological explanation methods.
arXiv Detail & Related papers (2025-02-20T16:05:45Z)
Improving semantic understanding in speech language models via brain-tuning [19.732593005537606]
Speech language models align with human brain responses to natural language to an impressive degree. Current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics. We address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings.
arXiv Detail & Related papers (2024-10-11T20:06:21Z)
Causal Graph in Language Model Rediscovers Cortical Hierarchy in Human Narrative Processing [0.0]
Previous studies have demonstrated that the features of language models can be mapped to fMRI brain activity. This raises the question: is there a commonality between information processing in language models and the human brain? To estimate information flow patterns in a language model, we examined the causal relationships between different layers.
arXiv Detail & Related papers (2023-11-17T10:09:12Z)
Humans and language models diverge when predicting repeating text [52.03471802608112]
We present a scenario in which the performance of humans and LMs diverges. Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory begins to play a role. We hope that this scenario will spur future work in bringing LMs closer to human behavior.
arXiv Detail & Related papers (2023-10-10T08:24:28Z)
Meta predictive learning model of languages in neural circuits [2.5690340428649328]
We propose a mean-field learning model within the predictive coding framework. Our model reveals that most of the connections become deterministic after learning. Our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
arXiv Detail & Related papers (2023-09-08T03:58:05Z)
Why can neural language models solve next-word prediction? A mathematical perspective [53.807657273043446]
We study a class of formal languages that can be used to model real-world examples of English sentences. Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.
arXiv Detail & Related papers (2023-06-20T10:41:23Z)
Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models [49.39276272693035]
Large-scale pre-trained language models have shown remarkable memorizing ability. Vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. We find that 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation.
arXiv Detail & Related papers (2023-05-16T03:50:38Z)
Training language models for deeper understanding improves brain alignment [5.678337324555035]
Building systems that achieve a deeper understanding of language is one of the central goals of natural language processing (NLP) We show that training language models for deeper narrative understanding results in richer representations that have improved alignment to human brain activity.
arXiv Detail & Related papers (2022-12-21T10:15:19Z)
Joint processing of linguistic properties in brains and language models [14.997785690790032]
We investigate the correspondence between the detailed processing of linguistic information by the human brain versus language models. We find that elimination of specific linguistic properties results in a significant decrease in brain alignment. These findings provide clear evidence for the role of specific linguistic information in the alignment between brain and language models.
arXiv Detail & Related papers (2022-12-15T19:13:42Z)
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words. We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z)
Long-range and hierarchical language predictions in brains and algorithms [82.81964713263483]
We show that while deep language algorithms are optimized to predict adjacent words, the human brain would be tuned to make long-range and hierarchical predictions. This study strengthens predictive coding theory and suggests a critical role of long-range and hierarchical predictions in natural language processing.
arXiv Detail & Related papers (2021-11-28T20:26:07Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.