Related papers: Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement

Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement

URL: http://arxiv.org/abs/2510.22860v1
Date: Sun, 26 Oct 2025 22:46:26 GMT
Title: Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement
Authors: Linyang He, Tianjun Zhong, Richard Antonello, Gavin Mischler, Micah Goldblum, Nima Mesgarani,
Abstract summary: Modern large language models (LLMs) are increasingly used to model neural responses to language.<n>Their internal representations are highly "entangled," mixing information about lexicon, syntax, meaning, and reasoning.<n>This entanglement biases conventional brain encoding analyses toward linguistically shallow features.
Score: 43.96899536703126
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Understanding how the human brain progresses from processing simple linguistic inputs to performing high-level reasoning is a fundamental challenge in neuroscience. While modern large language models (LLMs) are increasingly used to model neural responses to language, their internal representations are highly "entangled," mixing information about lexicon, syntax, meaning, and reasoning. This entanglement biases conventional brain encoding analyses toward linguistically shallow features (e.g., lexicon and syntax), making it difficult to isolate the neural substrates of cognitively deeper processes. Here, we introduce a residual disentanglement method that computationally isolates these components. By first probing an LM to identify feature-specific layers, our method iteratively regresses out lower-level representations to produce four nearly orthogonal embeddings for lexicon, syntax, meaning, and, critically, reasoning. We used these disentangled embeddings to model intracranial (ECoG) brain recordings from neurosurgical patients listening to natural speech. We show that: 1) This isolated reasoning embedding exhibits unique predictive power, accounting for variance in neural activity not explained by other linguistic features and even extending to the recruitment of visual regions beyond classical language areas. 2) The neural signature for reasoning is temporally distinct, peaking later (~350-400ms) than signals related to lexicon, syntax, and meaning, consistent with its position atop a processing hierarchy. 3) Standard, non-disentangled LLM embeddings can be misleading, as their predictive success is primarily attributable to linguistically shallow features, masking the more subtle contributions of deeper cognitive processing.

Related papers

Decoding Predictive Inference in Visual Language Processing via Spatiotemporal Neural Coherence [2.208251557767776]
We present a machine learning framework for decoding neural responses to visual language stimuli in Deaf signers.<n>Our results reveal distributed left-hemispheric and low-frequency coherence as key features in language comprehension.<n>This work demonstrates a novel approach for probing experience-driven generative models of perception in the brain.
arXiv Detail & Related papers (2025-12-24T04:19:20Z)
Explanations of Large Language Models Explain Language Representations in the Brain [5.7916055414970895]
We propose a novel approach using explainable AI (XAI) to strengthen link between language processing and brain neural activity.<n>Applying attribution methods, we quantify the influence of preceding words on predictions.<n>We find stronger attributions suggest brain alignment for assessing the biological explanation methods.
arXiv Detail & Related papers (2025-02-20T16:05:45Z)
What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores [1.8175282137722093]
Internal representations from large language models (LLMs) achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. Here, we analyze three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings.
arXiv Detail & Related papers (2024-06-03T17:13:27Z)
Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax and Context [87.31930367845125]
We trained a lexical language model, Glove, and a supra-lexical language model, GPT-2, on a text corpus. We then assessed to what extent these information-restricted models were able to predict the time-courses of fMRI signal of humans listening to naturalistic text. Our analyses show that, while most brain regions involved in language are sensitive to both syntactic and semantic variables, the relative magnitudes of these effects vary a lot across these regions.
arXiv Detail & Related papers (2023-02-28T08:16:18Z)
Deep Learning Models to Study Sentence Comprehension in the Human Brain [0.1503974529275767]
Recent artificial neural networks that process natural language achieve unprecedented performance in tasks requiring sentence-level understanding. We review works that compare these artificial language models with human brain activity and we assess the extent to which this approach has improved our understanding of the neural processes involved in natural language comprehension.
arXiv Detail & Related papers (2023-01-16T10:31:25Z)
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words. We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z)
Toward a realistic model of speech processing in the brain with self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate. We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z)
Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli. Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z)
Does injecting linguistic structure into language models lead to better alignment with brain recordings? [13.880819301385854]
We evaluate whether language models align better with brain recordings if their attention is biased by annotations from syntactic or semantic formalisms. Our proposed approach enables the evaluation of more targeted hypotheses about the composition of meaning in the brain.
arXiv Detail & Related papers (2021-01-29T14:42:02Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.