EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction
- URL: http://arxiv.org/abs/2511.19155v1
- Date: Mon, 24 Nov 2025 14:23:42 GMT
- Title: EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction
- Authors: Xihe Qiu, Gengchen Ma, Haoyu Wang, Chen Zhan, Xiaoyu Tan, Shuo Li,
- Abstract summary: We propose EEG-VLM, a hierarchical vision-language framework that integrates multi-level feature alignment with visually enhanced language-guided reasoning for interpretable EEG-based sleep stage classification.<n> Experimental results demonstrate that the proposed method significantly improves both the accuracy and interpretability of VLMs in EEG-based sleep stage classification.
- Score: 17.251077744298808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sleep stage classification based on electroencephalography (EEG) is fundamental for assessing sleep quality and diagnosing sleep-related disorders. However, most traditional machine learning methods rely heavily on prior knowledge and handcrafted features, while existing deep learning models still struggle to jointly capture fine-grained time-frequency patterns and achieve clinical interpretability. Recently, vision-language models (VLMs) have made significant progress in the medical domain, yet their performance remains constrained when applied to physiological waveform data, especially EEG signals, due to their limited visual understanding and insufficient reasoning capability. To address these challenges, we propose EEG-VLM, a hierarchical vision-language framework that integrates multi-level feature alignment with visually enhanced language-guided reasoning for interpretable EEG-based sleep stage classification. Specifically, a specialized visual enhancement module constructs high-level visual tokens from intermediate-layer features to extract rich semantic representations of EEG images. These tokens are further aligned with low-level CLIP features through a multi-level alignment mechanism, enhancing the VLM's image-processing capability. In addition, a Chain-of-Thought (CoT) reasoning strategy decomposes complex medical inference into interpretable logical steps, effectively simulating expert-like decision-making. Experimental results demonstrate that the proposed method significantly improves both the accuracy and interpretability of VLMs in EEG-based sleep stage classification, showing promising potential for automated and explainable EEG analysis in clinical settings.
Related papers
- RAICL: Retrieval-Augmented In-Context Learning for Vision-Language-Model Based EEG Seizure Detection [12.189806103703887]
We propose a paradigm shift from conventional signal-based decoding by leveraging large-scale vision-language models (VLMs) to analyze EEG waveform plots.<n>To address the inherent non-stationarity of EEG signals, we introduce a Retrieval-Augmented In-Context Learning (RAICL) approach.
arXiv Detail & Related papers (2026-01-25T13:58:31Z) - E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z) - WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities [55.00677513249723]
EEG signals simultaneously encode both cognitive processes and intrinsic neural states.<n>We map EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation.<n>The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations.
arXiv Detail & Related papers (2025-09-26T06:21:51Z) - DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization [0.7550566004119158]
We present DiagECG, a novel framework that integrates time-series and language modeling.<n>Our approach discretizes continuous ECG embeddings into symbolic tokens using a lead-independent encoder and quantization module.
arXiv Detail & Related papers (2025-08-21T08:13:37Z) - Interpretable EEG-to-Image Generation with Semantic Prompts [6.712646807032639]
Our model bypasses direct EEG-to-image generation by aligning EEG signals with semantic captions.<n>A transformer-based EEG encoder maps brain activity to these captions through contrastive learning.<n>This text-mediated framework yields state-of-the-art visual decoding on the EEGCVPR dataset.
arXiv Detail & Related papers (2025-07-09T17:18:06Z) - CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z) - Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework [18.95201514457046]
Large language models (LLMs) have been applied to electrocardiogram (ECG) interpretation.<n>Electrocardiogram-Language Models (ELMs) emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans.<n>We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics.
arXiv Detail & Related papers (2025-05-24T19:43:15Z) - PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z) - CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z) - EEG-GPT: Exploring Capabilities of Large Language Models for EEG
Classification and Interpretation [0.0]
We propose EEG-GPT, a unifying approach to EEG classification that leverages advances in large language models (LLM)
EEG-GPT achieves excellent performance comparable to current state-of-the-art deep learning methods in classifying normal from abnormal EEG in a few-shot learning paradigm utilizing only 2% of training data.
arXiv Detail & Related papers (2024-01-31T17:08:34Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.