Related papers: MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets

MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets

URL: http://arxiv.org/abs/2508.14073v2
Date: Thu, 21 Aug 2025 07:34:07 GMT
Title: MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets
Authors: Qian Zhang, Ruilin Zhang, Jun Xiao, Yifan Liu, Zhe Wang,
Abstract summary: This paper proposes a semi-supervised learning framework named MCLPD.<n>It integrates multi-view contrastive pre-training with lightweight supervised fine-tuning to enhance cross-dataset PD detection performance.
Score: 18.392841877276354
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Electroencephalography has been validated as an effective technique for detecting Parkinson's disease,particularly in its early stages.However,the high cost of EEG data annotation often results in limited dataset size and considerable discrepancies across datasets,including differences in acquisition protocols and subject demographics,significantly hinder the robustness and generalizability of models in cross-dataset detection scenarios.To address such challenges,this paper proposes a semi-supervised learning framework named MCLPD,which integrates multi-view contrastive pre-training with lightweight supervised fine-tuning to enhance cross-dataset PD detection performance.During pre-training,MCLPD uses self-supervised learning on the unlabeled UNM dataset.To build contrastive pairs,it applies dual augmentations in both time and frequency domains,which enrich the data and naturally fuse time-frequency information.In the fine-tuning phase,only a small proportion of labeled data from another two datasets (UI and UC)is used for supervised optimization.Experimental results show that MCLPD achieves F1 scores of 0.91 on UI and 0.81 on UC using only 1%of labeled data,which further improve to 0.97 and 0.87,respectively,when 5%of labeled data is used.Compared to existing methods,MCLPD substantially improves cross-dataset generalization while reducing the dependency on labeled data,demonstrating the effectiveness of the proposed framework.

Related papers

PySeizure: A single machine learning classifier framework to detect seizures in diverse datasets [0.0]
We introduce an innovative, open-source machine-learning framework that enables robust seizure detection across varied clinical datasets.<n>To enhance robustness, the framework incorporates an automated pre-processing pipeline to standardise data and a majority voting mechanism.<n>We train, tune, and evaluate models within each dataset, assessing their cross-dataset transferability.
arXiv Detail & Related papers (2025-08-10T09:12:29Z)
Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
The role of data partitioning on the performance of EEG-based deep learning models in supervised cross-subject analysis: a preliminary study [37.69303106863453]
Deep learning is advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns.<n>No comprehensive guidelines for proper data partitioning and cross-validation exist in the domain.<n>This paper thoroughly investigates the role of data partitioning and cross-validation in evaluating EEG deep learning models.
arXiv Detail & Related papers (2025-05-19T12:05:28Z)
Fine-tuning can Help Detect Pretraining Data from Large Language Models [7.7209640786782385]
Current methods differentiate members and non-members by designing scoring functions, like Perplexity and Min-k%.<n>We introduce a novel and effective method termed Fine-tuned Score Deviation(FSD), which improves the performance of current scoring functions for pretraining data detection.<n>In particular, we propose to measure the deviation distance of current scores after fine-tuning on a small amount of unseen data within the same domain.
arXiv Detail & Related papers (2024-10-09T15:36:42Z)
SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training [12.745160748376794]
We propose a soft deduplication method that maintains dataset integrity while selectively reducing the sampling weight of data with high commonness. Central to our approach is the concept of "data commonness", a metric we introduce to quantify the degree of duplication. Empirical analysis shows that this method significantly improves training efficiency, achieving comparable perplexity scores with at least a 26% reduction in required training steps.
arXiv Detail & Related papers (2024-07-09T08:26:39Z)
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction [53.88231294380083]
We introduce a novel Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data. Our findings confirm that pre-trained layers can adapt to new embedding spaces, enhancing performance without overfitting.
arXiv Detail & Related papers (2024-06-27T04:00:15Z)
PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce. We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD. Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z)
A Dataset Fusion Algorithm for Generalised Anomaly Detection in Homogeneous Periodic Time Series Datasets [0.0]
"Dataset Fusion" is an algorithm for fusing periodic signals from multiple homogeneous datasets into a single dataset. The proposed approach significantly outperforms conventional training approaches with an Average F1 score of 0.879. Results show that using only 6.25% of the training data, translating to a 93.7% reduction in computational power, results in a mere 4.04% decrease in performance.
arXiv Detail & Related papers (2023-05-14T16:24:09Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.