PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis
- URL: http://arxiv.org/abs/2603.02268v1
- Date: Sat, 28 Feb 2026 19:50:28 GMT
- Title: PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis
- Authors: Jeet Bandhu Lahiri, Parshva Runwal, Arvasu Kulkarni, Mahir Jain, Aditya Ray Mishra, Siddharth Panwar, Sandeep Singh,
- Abstract summary: We introduce PRISM, a masked autoencoder ablated along two axes -- pretraining population and downstream adaptation.<n>We compare a narrow-source EU/US corpus against a geographically diverse pool augmented with multi-center South Asian clinical recordings.<n> PRISM matches or outperforms REVE (92 datasets, 60,000+ hours) on the majority of tasks.
- Score: 5.616707402426108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: EEG foundation models are typically pretrained on narrow-source clinical archives and evaluated on benchmarks from the same ecosystem, leaving unclear whether representations encode neural physiology or recording-distribution artifacts. We introduce PRISM (Population Representative Invariant Signal Model), a masked autoencoder ablated along two axes -- pretraining population and downstream adaptation -- with architecture and preprocessing fixed. We compare a narrow-source EU/US corpus (TUH + PhysioNet) against a geographically diverse pool augmented with multi-center South Asian clinical recordings across multiple EEG systems. Three findings emerge. First, narrow-source pretraining yields stronger linear probes on distribution-matched benchmarks, while diverse pretraining produces more adaptable representations under fine-tuning -- a trade-off invisible under single-protocol evaluation. Trained on three source corpora, PRISM matches or outperforms REVE (92 datasets, 60,000+ hours) on the majority of tasks, demonstrating that targeted diversity can substitute for indiscriminate scale and that dataset count is a confounding variable in model comparison. Second, on a clinically challenging and previously untested task -- distinguishing epilepsy from diagnostic mimickers via interictal EEG -- the diverse checkpoint outperforms the narrow-source checkpoint by +12.3 pp balanced accuracy, the largest gap across all evaluations. Third, systematic inconsistencies between EEG-Bench and EEG-FM-Bench reverse model rankings on identical datasets by up to 24 pp; we identify six concrete sources including split construction, checkpoint selection, segment length, and normalization, showing these factors compound non-additively.
Related papers
- Rethinking Generalized BCIs: Benchmarking 340,000+ Unique Algorithmic Configurations for EEG Mental Command Decoding [0.0]
We present a benchmark evaluating over 340,000+ unique combinations of spatial and nonlinear EEG classification.<n>Our findings highlight that no universal 'one-size-fits-all' method can optimally decode EEG motor imagery patterns across all users or datasets.
arXiv Detail & Related papers (2025-12-02T17:56:46Z) - Geodesic Optimization for Predictive Shift Adaptation on EEG data [53.58711912565724]
Domain adaptation methods struggle when distribution shifts occur simultaneously in $X$ and $y$.
This paper proposes a novel method termed Geodesic Optimization for Predictive Shift Adaptation (GOPSA) to address test-time multi-source DA.
GOPSA has the potential to combine the advantages of mixed-effects modeling with machine learning for biomedical applications of EEG.
arXiv Detail & Related papers (2024-07-04T12:15:42Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Ensemble of Pre-Trained Neural Networks for Segmentation and Quality
Detection of Transmission Electron Microscopy Images [0.0]
Two types of ensembles of pre-trained neural networks were implemented in this work.
The ensembles performed semantic segmentation of ice crystal within a two-phase mixture.
The performance of EA and ER were evaluated on three different metrics: accuracy, calibration, and uncertainty.
arXiv Detail & Related papers (2022-09-05T11:15:25Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - An Adversarial Domain Separation Framework for Septic Shock Early
Prediction Across EHR Systems [7.058760708627898]
We propose a general domain adaptation (DA) framework that tackles two categories of discrepancies in EHRs collected from different medical systems.
We evaluate our framework for early diagnosis of an extremely challenging condition, septic shock, using two real-world EHRs from distinct medical systems in the U.S.
arXiv Detail & Related papers (2020-10-26T23:41:33Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Epileptic Seizure Classification with Symmetric and Hybrid Bilinear
Models [20.376912072606412]
This paper proposes a novel hybrid bilinear deep learning network with an application in the clinical procedures of epilepsy classification diagnosis.
The accuracy of the diagnosis is also complicated by overlapping medical symptoms, varying levels of experience and inter-ob variability among clinical professions.
arXiv Detail & Related papers (2020-01-15T03:22:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.