NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation
- URL: http://arxiv.org/abs/2511.12851v1
- Date: Mon, 17 Nov 2025 00:44:35 GMT
- Title: NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation
- Authors: Kang Yin, Hye-Bin Shin,
- Abstract summary: NeuroLex is a lightweight domain-adaptive language model trained purely on EEG report text.<n>NeuroLex is tailored to the linguistic and diagnostic characteristics of EEG reporting.<n>It bridges biomedical text modeling and brain-computer interface applications.
- Score: 2.1979513477844423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical electroencephalogram (EEG) reports encode domain-specific linguistic conventions that general-purpose language models (LMs) fail to capture. We introduce NeuroLex, a lightweight domain-adaptive language model trained purely on EEG report text from the Harvard Electroencephalography Database. Unlike existing biomedical LMs, NeuroLex is tailored to the linguistic and diagnostic characteristics of EEG reporting, enabling it to serve as both an independent textual model and a decoder backbone for multimodal EEG-language systems. Using span-corruption pretraining and instruction-style fine-tuning on report polishing, paragraph summarization, and terminology question answering, NeuroLex learns the syntax and reasoning patterns characteristic of EEG interpretation. Comprehensive evaluations show that it achieves lower perplexity, higher extraction and summarization accuracy, better label efficiency, and improved robustness to negation and factual hallucination compared with general models of the same scale. With an EEG-aware linguistic backbone, NeuroLex bridges biomedical text modeling and brain-computer interface applications, offering a foundation for interpretable and language-driven neural decoding.
Related papers
- WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities [55.00677513249723]
EEG signals simultaneously encode both cognitive processes and intrinsic neural states.<n>We map EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation.<n>The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations.
arXiv Detail & Related papers (2025-09-26T06:21:51Z) - Large Language Models for EEG: A Comprehensive Survey and Taxonomy [3.5803007109679212]
Large Language Models (LLMs) and electroencephalography (EEG) research is enabling new directions in neural decoding, brain-computer interfaces (BCIs), and affective computing.<n>This survey offers a systematic review and structured taxonomy of recent advancements that utilize LLMs for EEG-based analysis and applications.
arXiv Detail & Related papers (2025-06-02T18:58:57Z) - Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z) - sEEG-based Encoding for Sentence Retrieval: A Contrastive Learning Approach to Brain-Language Alignment [8.466223794246261]
We present SSENSE, a contrastive learning framework that projects single-subject stereo-electroencephalography (sEEG) signals into the sentence embedding space of a frozen CLIP model.<n>We evaluate our method on time-aligned sEEG and spoken transcripts from a naturalistic movie-watching dataset.
arXiv Detail & Related papers (2025-04-20T03:01:42Z) - Generative causal testing to bridge data-driven models and scientific theories in language neuroscience [82.995061475971]
We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain.<n>We show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity.
arXiv Detail & Related papers (2024-10-01T15:57:48Z) - A Lesion-aware Edge-based Graph Neural Network for Predicting Language Ability in Patients with Post-stroke Aphasia [12.129896943547912]
We propose a lesion-aware graph neural network (LEGNet) to predict language ability from resting-state fMRI (rs-fMRI) connectivity in patients with post-stroke aphasia.
Our model integrates three components: an edge-based learning module that encodes functional connectivity between brain regions, a lesion encoding module, and a subgraph learning module.
arXiv Detail & Related papers (2024-09-03T21:28:48Z) - EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping [0.0]
Multimodal language modeling has enabled breakthroughs for representation learning, yet remains unexplored in the realm of functional brain data for clinical phenotyping.<n>This paper pioneers EEG-language models (ELMs) trained on clinical reports and 15000 EEGs.
arXiv Detail & Related papers (2024-09-02T10:03:03Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system.
In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z) - BrainLLM: Generative Language Decoding from Brain Recordings [77.66707255697706]
We propose a generative language BCI that utilizes the capacity of a large language model and a semantic brain decoder.<n>The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli.<n>Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
arXiv Detail & Related papers (2023-11-16T13:37:21Z) - BELT:Bootstrapping Electroencephalography-to-Language Decoding and
Zero-Shot Sentiment Classification by Natural Language Supervision [31.382825932199935]
The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning.
With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets.
We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification.
arXiv Detail & Related papers (2023-09-21T13:24:01Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.