Related papers: CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm

CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm

URL: http://arxiv.org/abs/2506.11830v1
Date: Fri, 13 Jun 2025 14:34:29 GMT
Title: CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm
Authors: Dingkun Liu, Zhu Chen, Dongrui Wu,
Abstract summary: CLEAN-MI is a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm.<n>We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.
Score: 14.823896258504154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The construction of large-scale, high-quality datasets is a fundamental prerequisite for developing robust and generalizable foundation models in motor imagery (MI)-based brain-computer interfaces (BCIs). However, EEG signals collected from different subjects and devices are often plagued by low signal-to-noise ratio, heterogeneity in electrode configurations, and substantial inter-subject variability, posing significant challenges for effective model training. In this paper, we propose CLEAN-MI, a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm. CLEAN-MI integrates frequency band filtering, channel template selection, subject screening, and marginal distribution alignment to systematically filter out irrelevant or low-quality data and standardize multi-source EEG datasets. We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.

Related papers

MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification [12.648298676665886]
Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices.<n>Recent EEG foundation models aim to learn generalized representations across diverse BCI paradigms.<n>This paper proposes MIRepNet, the first EEG foundation model tailored for the motor imagery paradigm.
arXiv Detail & Related papers (2025-07-27T12:54:42Z)
A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network [0.0]
Automatic Modulation Classification (AMC) plays a key role in enhancing spectrum efficiency.<n>There is a pressing need for lightweight AMC models that balance low complexity with high classification accuracy.<n>This paper proposes a low-complexity, lightweight deep learning (DL) AMC model optimized for resource-constrained edge devices.
arXiv Detail & Related papers (2025-07-07T00:37:54Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles [3.124884279860061]
We introduce F-ANcGAN, an attention-enhanced cycle consistent generative adversarial system that can be trained using a limited number of data samples.<n>Our model uses a Style U-Net generator and a U-Net segmentation network equipped with self-attention to capture structural relationships.
arXiv Detail & Related papers (2025-05-23T17:02:22Z)
VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
CRISP: A Framework for Cryo-EM Image Segmentation and Processing with Conditional Random Field [0.0]
We present a pipeline that automatically generates high-quality segmentation maps from cryo-EM data.<n>Our modular framework enables the selection of various segmentation models and loss functions.<n>When trained on a limited set of micrographs, our approach achieves over 90% accuracy, recall, precision, Intersection over Union (IoU) and F1-score on synthetic data.
arXiv Detail & Related papers (2025-02-12T10:44:45Z)
Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.<n>Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
Graph Adapter of EEG Foundation Models for Parameter Efficient Fine Tuning [1.8946099300030472]
We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach designed to address these challenges.<n>EGA is integrated into a pre-trained temporal backbone model as a GNN-based module, freezing the backbone and allowing only the adapter to be fine-tuned.<n> Experimental evaluations on two healthcare-related downstream tasks-Major Depressive Disorder (MDD) and Abnormality Detection (TUAB)-show that EGA improves performance by up to 16.1% in F1-score compared with the backbone BENDR model.
arXiv Detail & Related papers (2024-11-25T07:30:52Z)
EEG-DCNet: A Fast and Accurate MI-EEG Dilated CNN Classification Method [10.791605945979995]
We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet)<n>We incorporate the $1times1$ convolutional layer and utilize the multi-branch parallel atrous convolutional architecture in EEG-DCNet.<n>We show that EEG-DCNet outperforms existing state-of-the-art (SOTA) approaches in terms of classification accuracy and Kappa scores.
arXiv Detail & Related papers (2024-11-12T09:47:50Z)
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime. We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z)
Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN) CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z)
Leveraging generative adversarial networks to create realistic scanning transmission electron microscopy images [2.5954872177280346]
Machine learning could revolutionize materials research through autonomous data collection and processing. We employ a cycle generative adversarial network (CycleGAN) with a reciprocal space discriminator to augment simulated data with realistic spatial frequency information. We showcase our approach by training a fully convolutional network (FCN) to identify single atom defects in a 4.5 million atom data set.
arXiv Detail & Related papers (2023-01-18T19:19:27Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
A Generative Learning Approach for Spatio-temporal Modeling in Connected Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles. LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure. In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.