Related papers: Stroke Lesions as a Rosetta Stone for Language Model Interpretability

Stroke Lesions as a Rosetta Stone for Language Model Interpretability

URL: http://arxiv.org/abs/2602.04074v1
Date: Tue, 03 Feb 2026 23:22:37 GMT
Title: Stroke Lesions as a Rosetta Stone for Language Model Interpretability
Authors: Julius Fridriksson, Roger D. Newman-Norlund, Saeed Ahmadi, Regan Willis, Nadra Salman, Kalil Warren, Xiang Guan, Yong Yang, Srihari Nelakuditi, Rutvik Desai, Leonardo Bonilha, Jeff Charney, Chris Rorden,
Abstract summary: We present the Brain-LLM Unified Model (BLUM) as an external reference structure for evaluating large language models.<n>Using data from individuals with chronic post-stroke aphasia, we trained symptom-to-lesion models that predict brain damage location from behavioral error profiles.<n>BLUM error profiles were sufficiently similar to human error profiles that predicted lesions corresponded to actual lesions in error-matched humans above chance.
Score: 6.528508321422611
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable capabilities, yet methods to verify which model components are truly necessary for language function remain limited. Current interpretability approaches rely on internal metrics and lack external validation. Here we present the Brain-LLM Unified Model (BLUM), a framework that leverages lesion-symptom mapping, the gold standard for establishing causal brain-behavior relationships for over a century, as an external reference structure for evaluating LLM perturbation effects. Using data from individuals with chronic post-stroke aphasia (N = 410), we trained symptom-to-lesion models that predict brain damage location from behavioral error profiles, applied systematic perturbations to transformer layers, administered identical clinical assessments to perturbed LLMs and human patients, and projected LLM error profiles into human lesion space. LLM error profiles were sufficiently similar to human error profiles that predicted lesions corresponded to actual lesions in error-matched humans above chance in 67% of picture naming conditions (p < 10^{-23}) and 68.3% of sentence completion conditions (p < 10^{-61}), with semantic-dominant errors mapping onto ventral-stream lesion patterns and phonemic-dominant errors onto dorsal-stream patterns. These findings open a new methodological avenue for LLM interpretability in which clinical neuroscience provides external validation, establishing human lesion-symptom mapping as a reference framework for evaluating artificial language systems and motivating direct investigation of whether behavioral alignment reflects shared computational principles.

Related papers

A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes [40.41503864764337]
We introduce a component-level framework that simulates aphasia by selectively perturbing functional components in large language models.<n>Our pipeline identifies subtype-linked components for Broca's and Wernicke's aphasia, and induces graded impairments by progressively perturbing the top-k subtype-linked components.<n>Across architectures and lesioning strategies, subtype-targeted perturbations yield more systematic, aphasia-like regressions than size-matched random perturbations.
arXiv Detail & Related papers (2026-01-27T15:47:22Z)
A Monosemantic Attribution Framework for Stable Interpretability in Clinical Neuroscience Large Language Models [9.694820939059339]
Interpretability remains a key challenge for deploying large language models (LLMs) in clinical settings such as Alzheimer's disease progression diagnosis.<n>We introduce a unified interpretability framework that integrates attributional and mechanistic perspectives.
arXiv Detail & Related papers (2026-01-25T19:03:04Z)
Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error [0.0]
This study examines how different forms of epistemic failure emerge, are masked, and are tolerated in human AI interaction.<n>Evaluators frequently conflated criteria such as correctness, relevance, bias, groundedness, and consistency, indicating that human judgment collapses analytical distinctions into intuitives shaped by form and fluency.<n>The study provides implications for LLM assessment, digital literacy, and the design of trustworthy human AI communication.
arXiv Detail & Related papers (2025-12-18T16:45:29Z)
Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning [3.3482359447109866]
Large Language Models (LLMs) have attained human-level accuracy on medical question-answer (QA) benchmarks.<n>Their limitations in navigating open-ended clinical scenarios have recently been shown.<n>We present the medical abstraction and reasoning corpus (M-ARC)<n>We find that LLMs, including current state-of-the-art o1 and Gemini models, perform poorly compared to physicians on M-ARC.
arXiv Detail & Related papers (2025-02-05T18:14:27Z)
Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments [50.310636905746975]
Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process. Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature. We propose self-healing machine learning (SHML) to overcome these limitations.
arXiv Detail & Related papers (2024-10-31T20:05:51Z)
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
We show that editing a small subset of parameters can effectively modulate specific behaviors of large language models (LLMs)<n>Our approach achieves reductions of up to 90.0% in toxicity on the RealToxicityPrompts dataset and 49.2% on ToxiGen.
arXiv Detail & Related papers (2024-07-11T17:52:03Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation [90.09260023184932]
Retrieval-Augmented Generation (RAG) grounds Large Language Model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. We measure relevance assessment using: (i) hallucination rate, measuring model tendency to hallucinate, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type [10.665308703417665]
In recent years there has been a burgeoning interest in the use of computational methods to distinguish between elicited speech samples produced by patients with dementia, and those from healthy controls. The difference between perplexity estimates from two neural language models (LMs) has been shown to produce state-of-the-art performance. We find that perplexity of neural LMs is strongly and differentially associated with lexical frequency, and that a mixture model resulting from interpolating control and dementia LMs improves upon the current state-of-the-art for models trained on transcript text exclusively.
arXiv Detail & Related papers (2020-05-07T16:22:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.