Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG
- URL: http://arxiv.org/abs/2411.19230v1
- Date: Thu, 28 Nov 2024 15:53:32 GMT
- Title: Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG
- Authors: Xinxu Wei, Kanhao Zhao, Yong Jiao, Nancy B. Carlisle, Hua Xie, Yu Zhang,
- Abstract summary: We propose a Graph Contrastive Masked Autoencoder Distiller to bridge the gap between unlabeled/labeled and high/low-density EEG data.<n>For knowledge distillation from high-density to low-density EEG data, we propose a Graph Topology Distillation loss function.<n>We demonstrate the effectiveness of our method on four classification tasks across two clinical EEG datasets.
- Score: 4.006670302810497
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Effectively utilizing extensive unlabeled high-density EEG data to improve performance in scenarios with limited labeled low-density EEG data presents a significant challenge. In this paper, we address this by framing it as a graph transfer learning and knowledge distillation problem. We propose a Unified Pre-trained Graph Contrastive Masked Autoencoder Distiller, named EEG-DisGCMAE, to bridge the gap between unlabeled/labeled and high/low-density EEG data. To fully leverage the abundant unlabeled EEG data, we introduce a novel unified graph self-supervised pre-training paradigm, which seamlessly integrates Graph Contrastive Pre-training and Graph Masked Autoencoder Pre-training. This approach synergistically combines contrastive and generative pre-training techniques by reconstructing contrastive samples and contrasting the reconstructions. For knowledge distillation from high-density to low-density EEG data, we propose a Graph Topology Distillation loss function, allowing a lightweight student model trained on low-density data to learn from a teacher model trained on high-density data, effectively handling missing electrodes through contrastive distillation. To integrate transfer learning and distillation, we jointly pre-train the teacher and student models by contrasting their queries and keys during pre-training, enabling robust distillers for downstream tasks. We demonstrate the effectiveness of our method on four classification tasks across two clinical EEG datasets with abundant unlabeled data and limited labeled data. The experimental results show that our approach significantly outperforms contemporary methods in both efficiency and accuracy.
Related papers
- Adversarial Curriculum Graph-Free Knowledge Distillation for Graph Neural Networks [61.608453110751206]
We propose a fast and high-quality data-free knowledge distillation approach for graph neural networks.
The proposed graph-free KD method (ACGKD) significantly reduces the spatial complexity of pseudo-graphs.
ACGKD eliminates the dimensional ambiguity between the student and teacher models by increasing the student's dimensions.
arXiv Detail & Related papers (2025-04-01T08:44:27Z) - Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data.
DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z) - Self-Supervised Pre-Training with Joint-Embedding Predictive Architecture Boosts ECG Classification Performance [0.0]
We create a large unsupervised pre-training dataset by combining ten public ECG databases.
We pre-train Vision Transformers using JEPA on this dataset and fine-tune them on various PTB-XL benchmarks.
arXiv Detail & Related papers (2024-10-02T08:25:57Z) - Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers.
At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information.
The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z) - How Homogenizing the Channel-wise Magnitude Can Enhance EEG Classification Model? [4.0871083166108395]
We propose a simple yet effective approach for EEG data pre-processing.
Our method first transforms the EEG data into an encoded image by an Inverted Channel-wise Magnitude Homogenization.
By doing so, we can improve the EEG learning process efficiently without using a huge Deep Learning network.
arXiv Detail & Related papers (2024-07-19T09:11:56Z) - CE-SSL: Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection [16.34314710823127]
We propose a computation-efficient semi-supervised learning paradigm (CE-SSL) for robust and computation-efficient CVDs detection using ECG.
It enables a robust adaptation of pre-trained models on downstream datasets with limited supervision and high computational efficiency.
CE-SSL not only outperforms the state-of-the-art methods in multi-label CVDs detection but also consumes fewer GPU footprints, training time, and parameter storage space.
arXiv Detail & Related papers (2024-06-20T14:45:13Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - MELEP: A Novel Predictive Measure of Transferability in Multi-Label ECG Diagnosis [1.3654846342364306]
We introduce MELEP, a measure designed to estimate the effectiveness of knowledge transfer from a pre-trained model to a downstream ECG diagnosis task.
Our experiments show that MELEP can predict the performance of pre-trained convolutional and recurrent deep neural networks, on small and imbalanced ECG data.
arXiv Detail & Related papers (2023-10-27T14:57:10Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Data Augmentation for Enhancing EEG-based Emotion Recognition with Deep
Generative Models [13.56090099952884]
We propose three methods for augmenting EEG training data to enhance the performance of emotion recognition models.
For the full usage strategy, all of the generated data are augmented to the training dataset without judging the quality of the generated data.
The experimental results demonstrate that the augmented training datasets produced by our methods enhance the performance of EEG-based emotion recognition models.
arXiv Detail & Related papers (2020-06-04T21:23:09Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.