Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
- URL: http://arxiv.org/abs/2602.16072v2
- Date: Thu, 19 Feb 2026 03:14:01 GMT
- Title: Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
- Authors: Chenda Duan, Yipeng Zhang, Sotaro Kanai, Yuanyi Ding, Atsuro Daida, Pengyue Yu, Tiancheng Zheng, Naoto Kuroda, Shaun A. Hussain, Eishi Asano, Hiroki Nariai, Vwani Roychowdhury,
- Abstract summary: $textbf Omni-iEEG is a large-scale, pre-surgical iEEG resource comprising $textbf302 patients$ and $textbf178 hours$ of high-resolution recordings.<n>It includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, validated by board-certified epileptologists.<n>It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings.
- Score: 4.4167069169736655
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.
Related papers
- LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection [0.0]
LookAroundNet is a transformer-based seizure detector that uses a wider temporal window of EEG data to model seizure activity.<n>We evaluate the proposed method on multiple EEG datasets spanning diverse clinical environments.
arXiv Detail & Related papers (2026-01-09T18:52:24Z) - Toward Content-based Indexing and Retrieval of Head and Neck CT with Abscess Segmentation [14.966261216613757]
Abscesses in the head and neck represent an acute infectious process that can potentially lead to sepsis or mortality if not diagnosed and managed promptly.<n>We introduce AbscessHeNe, a curated and comprehensively annotated dataset comprising 4,926 contrast-enhanced CT slices with clinically confirmed head and neck abscesses.
arXiv Detail & Related papers (2025-12-01T12:04:24Z) - Affordable EEG, Actionable Insights: An Open Dataset and Evaluation Framework for Epilepsy Patient Stratification [2.879398564096746]
We present NEUROSKY-EPI, the first open dataset of single-channel, consumer-grade EEG for epilepsy.<n>To explore its utility, we introduce EmbedCluster, a patient-stratification pipeline.<n>Results show that low-cost, single-channel data can support meaningful stratification.
arXiv Detail & Related papers (2025-10-22T15:25:05Z) - Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge [30.611482996378683]
Image and disease variability hinder the development of generalizable AI algorithms with clinical value.
We present a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion (ISLES) challenge.
We combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions.
arXiv Detail & Related papers (2024-03-28T13:56:26Z) - TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models [21.437563965711004]
We present TRIALSCOPE, a framework designed to generate robust real-world evidence from observational data at scale.<n>The framework was shown to automatically curate high-quality structured patient data, expanding the dataset and incorporating key patient attributes only available in unstructured form.<n>We were also able to show that TRIALSCOPE could reproduce results of lung and pancreatic cancer clinical trials from the extracted real world data.
arXiv Detail & Related papers (2023-11-02T15:15:47Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - From Isolation to Collaboration: Federated Class-Heterogeneous Learning for Chest X-Ray Classification [4.0907576027258985]
Federated learning is a promising paradigm to collaboratively train a global chest x-ray (CXR) classification model.
We propose surgical aggregation, a FL method that uses selective aggregation to collaboratively train a global model.
Our results show that our method outperforms current methods and has better generalizability.
arXiv Detail & Related papers (2023-01-17T03:53:29Z) - DICE: Data-Efficient Clinical Event Extraction with Generative Models [93.49354508621232]
Event extraction for the clinical domain is an under-explored research area.
We introduce DICE, a robust and data-efficient generative model for clinical event extraction.
Our experiments demonstrate state-of-the-art performances of DICE for clinical and news domain event extraction.
arXiv Detail & Related papers (2022-08-16T23:12:04Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.