Machine-learning competition to grade EEG background patterns in newborns with hypoxic-ischaemic encephalopathy
- URL: http://arxiv.org/abs/2509.09695v2
- Date: Thu, 30 Oct 2025 13:14:13 GMT
- Title: Machine-learning competition to grade EEG background patterns in newborns with hypoxic-ischaemic encephalopathy
- Authors: Fabio Magarelli, Geraldine B. Boylan, Saeed Montazeri, Feargal O'Sullivan, Dominic Lightbody, Minoo Ashoori, Tamara Skoric, John M. O'Toole,
- Abstract summary: We compiled a retrospective dataset containing 353 hours of EEG from 102 individual newborns from a multi-centre study.<n>The data was fully anonymised and divided into training, testing, and held-out validation datasets.<n>Next, we created a web-based competition platform and hosted a machine learning competition to develop ML models for classifying the severity of EEG background patterns in newborns.
- Score: 1.0118253437732931
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning (ML) has the potential to support and improve expert performance in monitoring the brain function of at-risk newborns. Developing accurate and reliable ML models depends on access to high-quality, annotated data, a resource in short supply. ML competitions address this need by providing researchers access to expertly annotated datasets, fostering shared learning through direct model comparisons, and leveraging the benefits of crowdsourcing diverse expertise. We compiled a retrospective dataset containing 353 hours of EEG from 102 individual newborns from a multi-centre study. The data was fully anonymised and divided into training, testing, and held-out validation datasets. EEGs were graded for the severity of abnormal background patterns. Next, we created a web-based competition platform and hosted a machine learning competition to develop ML models for classifying the severity of EEG background patterns in newborns. After the competition closed, the top 4 performing models were evaluated offline on a separate held-out validation dataset. Although a feature-based model ranked first on the testing dataset, deep learning models generalised better on the validation sets. All methods had a significant decline in validation performance compared to the testing performance. This highlights the challenges for model generalisation on unseen data, emphasising the need for held-out validation datasets in ML studies with neonatal EEG. The study underscores the importance of training ML models on large and diverse datasets to ensure robust generalisation. The competition's outcome demonstrates the potential for open-access data and collaborative ML development to foster a collaborative research environment and expedite the development of clinical decision-support tools for neonatal neuromonitoring.
Related papers
- Learning Robust Diffusion Models from Imprecise Supervision [75.53546939251146]
DMIS is a unified framework for training robust Conditional Diffusion Models from Imprecise Supervision.<n>Our framework is derived from likelihood and decomposes the objective into generative and classification components.<n>Experiments on diverse forms of imprecise supervision, covering tasks covering image generation, weakly supervised learning, and dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.
arXiv Detail & Related papers (2025-10-03T14:00:32Z) - BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning [47.451445173060094]
Human infants rapidly develop visual reasoning skills from minimal input.<n>Recent efforts have leveraged infant-inspired datasets like SAYCam.<n>We propose BabyVLM, a novel framework comprising comprehensive in-domain evaluation benchmarks and a synthetic training dataset.
arXiv Detail & Related papers (2025-04-13T04:17:12Z) - Is Limited Participant Diversity Impeding EEG-based Machine Learning? [12.258707843214946]
It is common practice to split EEG recordings into small segments, thereby increasing the number of samples.<n>We conceptualise this as a multi-level data generation process and investigate the scaling behaviour of model performance.<n>We then use the same framework to investigate the effectiveness of different ML strategies designed to address limited data problems.
arXiv Detail & Related papers (2025-03-11T12:04:59Z) - Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.<n>LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.<n>Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - Federated Learning for Early Dropout Prediction on Healthy Ageing
Applications [0.0]
We present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data.
Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML.
arXiv Detail & Related papers (2023-09-08T13:17:06Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - BERT WEAVER: Using WEight AVERaging to enable lifelong learning for
transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model.
We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z) - Reconstructing Training Data from Diverse ML Models by Ensemble
Inversion [8.414622657659168]
Model Inversion (MI), in which an adversary abuses access to a trained Machine Learning (ML) model, has attracted increasing research attention.
We propose an ensemble inversion technique that estimates the distribution of original training data by training a generator constrained by an ensemble of trained models.
We achieve high quality results without any dataset and show how utilizing an auxiliary dataset that's similar to the presumed training data improves the results.
arXiv Detail & Related papers (2021-11-05T18:59:01Z) - Survival Prediction of Heart Failure Patients using Stacked Ensemble
Machine Learning Algorithm [0.0]
Heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide.
Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information.
Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure.
arXiv Detail & Related papers (2021-08-30T16:42:27Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.