CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm
- URL: http://arxiv.org/abs/2506.11830v1
- Date: Fri, 13 Jun 2025 14:34:29 GMT
- Title: CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm
- Authors: Dingkun Liu, Zhu Chen, Dongrui Wu,
- Abstract summary: CLEAN-MI is a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm.<n>We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.
- Score: 14.823896258504154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The construction of large-scale, high-quality datasets is a fundamental prerequisite for developing robust and generalizable foundation models in motor imagery (MI)-based brain-computer interfaces (BCIs). However, EEG signals collected from different subjects and devices are often plagued by low signal-to-noise ratio, heterogeneity in electrode configurations, and substantial inter-subject variability, posing significant challenges for effective model training. In this paper, we propose CLEAN-MI, a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm. CLEAN-MI integrates frequency band filtering, channel template selection, subject screening, and marginal distribution alignment to systematically filter out irrelevant or low-quality data and standardize multi-source EEG datasets. We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.
Related papers
- MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification [12.648298676665886]
Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices.<n>Recent EEG foundation models aim to learn generalized representations across diverse BCI paradigms.<n>This paper proposes MIRepNet, the first EEG foundation model tailored for the motor imagery paradigm.
arXiv Detail & Related papers (2025-07-27T12:54:42Z) - A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network [0.0]
Automatic Modulation Classification (AMC) plays a key role in enhancing spectrum efficiency.<n>There is a pressing need for lightweight AMC models that balance low complexity with high classification accuracy.<n>This paper proposes a low-complexity, lightweight deep learning (DL) AMC model optimized for resource-constrained edge devices.
arXiv Detail & Related papers (2025-07-07T00:37:54Z) - Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles [3.124884279860061]
We introduce F-ANcGAN, an attention-enhanced cycle consistent generative adversarial system that can be trained using a limited number of data samples.<n>Our model uses a Style U-Net generator and a U-Net segmentation network equipped with self-attention to capture structural relationships.
arXiv Detail & Related papers (2025-05-23T17:02:22Z) - VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z) - CRISP: A Framework for Cryo-EM Image Segmentation and Processing with Conditional Random Field [0.0]
We present a pipeline that automatically generates high-quality segmentation maps from cryo-EM data.<n>Our modular framework enables the selection of various segmentation models and loss functions.<n>When trained on a limited set of micrographs, our approach achieves over 90% accuracy, recall, precision, Intersection over Union (IoU) and F1-score on synthetic data.
arXiv Detail & Related papers (2025-02-12T10:44:45Z) - Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.<n>Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z) - Graph Adapter of EEG Foundation Models for Parameter Efficient Fine Tuning [1.8946099300030472]
We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach designed to address these challenges.<n>EGA is integrated into a pre-trained temporal backbone model as a GNN-based module, freezing the backbone and allowing only the adapter to be fine-tuned.<n> Experimental evaluations on two healthcare-related downstream tasks-Major Depressive Disorder (MDD) and Abnormality Detection (TUAB)-show that EGA improves performance by up to 16.1% in F1-score compared with the backbone BENDR model.
arXiv Detail & Related papers (2024-11-25T07:30:52Z) - EEG-DCNet: A Fast and Accurate MI-EEG Dilated CNN Classification Method [10.791605945979995]
We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet)<n>We incorporate the $1times1$ convolutional layer and utilize the multi-branch parallel atrous convolutional architecture in EEG-DCNet.<n>We show that EEG-DCNet outperforms existing state-of-the-art (SOTA) approaches in terms of classification accuracy and Kappa scores.
arXiv Detail & Related papers (2024-11-12T09:47:50Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - Leveraging generative adversarial networks to create realistic scanning
transmission electron microscopy images [2.5954872177280346]
Machine learning could revolutionize materials research through autonomous data collection and processing.
We employ a cycle generative adversarial network (CycleGAN) with a reciprocal space discriminator to augment simulated data with realistic spatial frequency information.
We showcase our approach by training a fully convolutional network (FCN) to identify single atom defects in a 4.5 million atom data set.
arXiv Detail & Related papers (2023-01-18T19:19:27Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - A Generative Learning Approach for Spatio-temporal Modeling in Connected
Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles.
LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure.
In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.