Clinically Calibrated Machine Learning Benchmarks for Large-Scale Multi-Disorder EEG Classification
- URL: http://arxiv.org/abs/2512.22656v1
- Date: Sat, 27 Dec 2025 17:11:17 GMT
- Title: Clinically Calibrated Machine Learning Benchmarks for Large-Scale Multi-Disorder EEG Classification
- Authors: Argha Kamal Samanta, Deepak Mewada, Monalisa Sarma, Debasis Samanta,
- Abstract summary: This study examines automated EEG-based classification across eleven clinically relevant neurological disorder categories.<n>Machine learning models are trained under severe class imbalance, with decision thresholds explicitly calibrated to prioritize diagnostic sensitivity.
- Score: 6.941409613662483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical electroencephalography is routinely used to evaluate patients with diverse and often overlapping neurological conditions, yet interpretation remains manual, time-intensive, and variable across experts. While automated EEG analysis has been widely studied, most existing methods target isolated diagnostic problems, particularly seizure detection, and provide limited support for multi-disorder clinical screening. This study examines automated EEG-based classification across eleven clinically relevant neurological disorder categories, encompassing acute time-critical conditions, chronic neurocognitive and developmental disorders, and disorders with indirect or weak electrophysiological signatures. EEG recordings are processed using a standard longitudinal bipolar montage and represented through a multi-domain feature set capturing temporal statistics, spectral structure, signal complexity, and inter-channel relationships. Disorder-aware machine learning models are trained under severe class imbalance, with decision thresholds explicitly calibrated to prioritize diagnostic sensitivity. Evaluation on a large, heterogeneous clinical EEG dataset demonstrates that sensitivity-oriented modeling achieves recall exceeding 80% for the majority of disorder categories, with several low-prevalence conditions showing absolute recall gains of 15-30% after threshold calibration compared to default operating points. Feature importance analysis reveals physiologically plausible patterns consistent with established clinical EEG markers. These results establish realistic performance baselines for multi-disorder EEG classification and provide quantitative evidence that sensitivity-prioritized automated analysis can support scalable EEG screening and triage in real-world clinical settings.
Related papers
- Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z) - ClinDEF: A Dynamic Evaluation Framework for Large Language Models in Clinical Reasoning [58.01333341218153]
We propose ClinDEF, a dynamic framework for assessing clinical reasoning in LLMs through simulated diagnostic dialogues.<n>Our method generates patient cases and facilitates multi-turn interactions between an LLM-based doctor and an automated patient agent.<n>Experiments show that ClinDEF effectively exposes critical clinical reasoning gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-12-29T12:58:58Z) - Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z) - Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z) - Ocular-Induced Abnormal Head Posture: Diagnosis and Missing Data Imputation [1.7061463565692456]
Ocular-induced abnormal head posture (AHP) is a compensatory mechanism that arises from ocular misalignment.<n>This study addresses both challenges through two complementary deep learning frameworks.<n>AHP-CADNet is a multi-level attention fusion framework for automated diagnosis.<n> curriculum learning-based imputation framework is designed to mitigate missing data.
arXiv Detail & Related papers (2025-10-07T07:51:59Z) - EEG-MedRAG: Enhancing EEG-based Clinical Decision-Making via Hierarchical Hypergraph Retrieval-Augmented Generation [45.031633614714]
EEG-MedRAG is a three-layer hypergraph-based retrieval-augmented generation framework.<n>It unifies EEG domain knowledge, individual patient cases, and a large-scale repository into a traversable n-ary relational hypergraph.<n>We introduce the first cross-disease, cross-role EEG clinical QA benchmark, spanning seven disorders and five authentic clinical perspectives.
arXiv Detail & Related papers (2025-08-19T11:12:58Z) - NeuroDx-LM: A Clinical Large-Scale Model for EEG-based Neurological Disorder Detection [7.185477956123345]
Large-scale models pre-trained on Electroencephalography (EEG) have shown promise in clinical applications such as neurological disorder detection.<n>NeuroDx-LM is a novel large-scale model specifically designed for detecting EEG-based neurological disorders.
arXiv Detail & Related papers (2025-08-11T16:02:25Z) - Deep Learning-Powered Electrical Brain Signals Analysis: Advancing Neurological Diagnostics [20.149456702857414]
Neurological disorders pose major global health challenges, driving advances in brain signal analysis.<n>EEG and intracranial EEG (iEEG) are widely used for diagnosis and monitoring.<n>This review systematically examines recent advances in deep learning approaches for EEG/iEEG-based neurological diagnostics.
arXiv Detail & Related papers (2025-02-24T14:45:05Z) - Power Spectral Density-Based Resting-State EEG Classification of
First-Episode Psychosis [1.3416169841532526]
We show the effectiveness of stimulus-independent EEG in identifying the abnormal activity patterns of pathological brains.
A generalized model incorporating multiple frequency bands should be more efficient in associating potential EEG biomarkers with First-Episode Psychosis (FEP)
A comprehensive discussion of our preprocessing methods for PSD analysis and a detailed comparison of different models are included in this paper.
arXiv Detail & Related papers (2022-11-23T00:28:41Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - Representation learning for improved interpretability and classification
accuracy of clinical factors from EEG [7.323779456638996]
EEG-based neural measures can function as reliable objective correlates of depression, or even predictors of depression and its course.
Previous studies have demonstrated that EEG-based neural measures can function as reliable objective correlates of depression, or even predictors of depression and its course.
However, their clinical utility has not been fully realized because of 1) the lack of automated ways to deal with the inherent noise associated with EEG data at scale, and 2) the lack of knowledge of which aspects of the EEG signal may be markers of a clinical disorder.
arXiv Detail & Related papers (2020-10-28T23:21:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.