Related papers: Fine-grained Analysis of Brain-LLM Alignment through Input Attribution

Fine-grained Analysis of Brain-LLM Alignment through Input Attribution

URL: http://arxiv.org/abs/2510.12355v1
Date: Tue, 14 Oct 2025 10:19:01 GMT
Title: Fine-grained Analysis of Brain-LLM Alignment through Input Attribution
Authors: Michela Proietti, Roberto Capobianco, Mariya Toneva,
Abstract summary: We introduce a fine-grained input attribution method to identify the specific words most important for brain-LLM alignment.<n>Our findings reveal that BA and NWP rely on largely distinct word subsets.
Score: 10.06817336976995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the alignment between large language models (LLMs) and human brain activity can reveal computational principles underlying language processing. We introduce a fine-grained input attribution method to identify the specific words most important for brain-LLM alignment, and leverage it to study a contentious research question about brain-LLM alignment: the relationship between brain alignment (BA) and next-word prediction (NWP). Our findings reveal that BA and NWP rely on largely distinct word subsets: NWP exhibits recency and primacy biases with a focus on syntax, while BA prioritizes semantic and discourse-level information with a more targeted recency effect. This work advances our understanding of how LLMs relate to human language processing and highlights differences in feature reliance between BA and NWP. Beyond this study, our attribution method can be broadly applied to explore the cognitive relevance of model predictions in diverse language processing tasks.

Related papers

Explanations of Large Language Models Explain Language Representations in the Brain [5.7916055414970895]
We propose a novel approach using explainable AI (XAI) to strengthen link between language processing and brain neural activity.<n>Applying attribution methods, we quantify the influence of preceding words on predictions.<n>We find stronger attributions suggest brain alignment for assessing the biological explanation methods.
arXiv Detail & Related papers (2025-02-20T16:05:45Z)
Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
Lost in Translation: The Algorithmic Gap Between LMs and the Brain [8.799971499357499]
Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing in the brain remains unclear. This paper examines the gaps and overlaps between LMs and the brain at different levels of analysis. We discuss how insights from neuroscience, such as sparsity, modularity, internal states, and interactive learning, can inform the development of more biologically plausible language models.
arXiv Detail & Related papers (2024-07-05T17:43:16Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.<n>This survey delves into an important attribute of these datasets: the dialect of a language.<n>Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension [4.390968520425543]
This study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task. We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading.
arXiv Detail & Related papers (2023-09-27T15:12:08Z)
Joint processing of linguistic properties in brains and language models [14.997785690790032]
We investigate the correspondence between the detailed processing of linguistic information by the human brain versus language models. We find that elimination of specific linguistic properties results in a significant decrease in brain alignment. These findings provide clear evidence for the role of specific linguistic information in the alignment between brain and language models.
arXiv Detail & Related papers (2022-12-15T19:13:42Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality. We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances. Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z)
A Survey On Neural Word Embeddings [0.4822598110892847]
The study of meaning in natural language processing relies on the distributional hypothesis. The revolutionary idea of distributed representation for a concept is close to the working of a human mind. Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks.
arXiv Detail & Related papers (2021-10-05T03:37:57Z)
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.