Dataset Refinement for Improving the Generalization Ability of the EEG Decoding Model
- URL: http://arxiv.org/abs/2411.10450v1
- Date: Thu, 31 Oct 2024 05:08:24 GMT
- Title: Dataset Refinement for Improving the Generalization Ability of the EEG Decoding Model
- Authors: Sung-Jin Kim, Dae-Hyeok Lee, Hyeon-Taek Han,
- Abstract summary: We propose a dataset refinement algorithm to eliminate noisy data from EEG datasets.
The proposed algorithm consistently led to better generalization performance compared to using the original dataset.
We conclude that removing noisy data from the training dataset alone can effectively improve the generalization performance of deep learning models in the EEG domain.
- Score: 2.9972387721489655
- License:
- Abstract: Electroencephalography (EEG) is a generally used neuroimaging approach in brain-computer interfaces due to its non-invasive characteristics and convenience, making it an effective tool for understanding human intentions. Therefore, recent research has focused on decoding human intentions from EEG signals utilizing deep learning methods. However, since EEG signals are highly susceptible to noise during acquisition, there is a high possibility of the existence of noisy data in the dataset. Although pioneer studies have generally assumed that the dataset is well-curated, this assumption is not always met in the EEG dataset. In this paper, we addressed this issue by designing a dataset refinement algorithm that can eliminate noisy data based on metrics evaluating data influence during the training process. We applied the proposed algorithm to two motor imagery EEG public datasets and three different models to perform dataset refinement. The results indicated that retraining the model with the refined dataset consistently led to better generalization performance compared to using the original dataset. Hence, we demonstrated that removing noisy data from the training dataset alone can effectively improve the generalization performance of deep learning models in the EEG domain.
Related papers
- Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation [9.812476193015488]
We propose a method of data generation in simulation using 3D synthetic environments and CycleGAN domain transfer.
We compare this method of data generation to the popular NYUDepth V2 dataset by training a depth estimation model based on the DenseDepth structure using different training sets of real and simulated data.
We evaluate the performance of the models on newly collected images and LiDAR depth data from a Husky robot to verify the generalizability of the approach and show that GAN-transformed data can serve as an effective alternative to real-world data, particularly in depth estimation.
arXiv Detail & Related papers (2024-05-02T09:21:10Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - EEGFormer: Towards Transferable and Interpretable Large-Scale EEG
Foundation Model [39.363511340878624]
We present a novel EEG foundation model, namely EEGFormer, pretrained on large-scale compound EEG data.
To validate the effectiveness of our model, we extensively evaluate it on various downstream tasks and assess the performance under different transfer settings.
arXiv Detail & Related papers (2024-01-11T17:36:24Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Two Heads are Better than One: A Bio-inspired Method for Improving
Classification on EEG-ET Data [14.086094296850122]
Classifying EEG data is integral to the performance of Brain Computer Interfaces (BCI) and their applications.
external noise often obstructs EEG data due to its biological nature and complex data collection process.
We propose a novel approach that integrates feature selection and time segmentation of EEG data.
arXiv Detail & Related papers (2023-03-25T23:44:39Z) - EEG Synthetic Data Generation Using Probabilistic Diffusion Models [0.0]
This study proposes an advanced methodology for data augmentation: generating synthetic EEG data using denoising diffusion probabilistic models.
The synthetic data are generated from electrode-frequency distribution maps (EFDMs) of emotionally labeled EEG recordings.
The proposed methodology has potential implications for the broader field of neuroscience research by enabling the creation of large, publicly available synthetic EEG datasets.
arXiv Detail & Related papers (2023-03-06T12:03:22Z) - Data augmentation for learning predictive models on EEG: a systematic
comparison [79.84079335042456]
deep learning for electroencephalography (EEG) classification tasks has been rapidly growing in the last years.
Deep learning for EEG classification tasks has been limited by the relatively small size of EEG datasets.
Data augmentation has been a key ingredient to obtain state-of-the-art performances across applications such as computer vision or speech.
arXiv Detail & Related papers (2022-06-29T09:18:15Z) - Transformer Networks for Data Augmentation of Human Physical Activity
Recognition [61.303828551910634]
State of the art models like Recurrent Generative Adrial Networks (RGAN) are used to generate realistic synthetic data.
In this paper, transformer based generative adversarial networks which have global attention on data, are compared on PAMAP2 and Real World Human Activity Recognition data sets with RGAN.
arXiv Detail & Related papers (2021-09-02T16:47:29Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z) - Data Augmentation for Enhancing EEG-based Emotion Recognition with Deep
Generative Models [13.56090099952884]
We propose three methods for augmenting EEG training data to enhance the performance of emotion recognition models.
For the full usage strategy, all of the generated data are augmented to the training dataset without judging the quality of the generated data.
The experimental results demonstrate that the augmented training datasets produced by our methods enhance the performance of EEG-based emotion recognition models.
arXiv Detail & Related papers (2020-06-04T21:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.