Related papers: A Statistical Approach for Synthetic EEG Data Generation

A Statistical Approach for Synthetic EEG Data Generation

URL: http://arxiv.org/abs/2504.16143v1
Date: Tue, 22 Apr 2025 06:48:42 GMT
Title: A Statistical Approach for Synthetic EEG Data Generation
Authors: Gideon Vos, Maryam Ebrahimpour, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi,
Abstract summary: This study proposes a method combining correlation analysis and random sampling to generate realistic synthetic EEG data.<n>A Random Forest model trained to distinguish synthetic from real EEG performs at chance level, indicating high fidelity.<n>This method provides a scalable, privacy-preserving approach for augmenting EEG datasets, enabling more efficient model training in mental health research.
Score: 2.5648452174203062
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Electroencephalogram (EEG) data is crucial for diagnosing mental health conditions but is costly and time-consuming to collect at scale. Synthetic data generation offers a promising solution to augment datasets for machine learning applications. However, generating high-quality synthetic EEG that preserves emotional and mental health signals remains challenging. This study proposes a method combining correlation analysis and random sampling to generate realistic synthetic EEG data. We first analyze interdependencies between EEG frequency bands using correlation analysis. Guided by this structure, we generate synthetic samples via random sampling. Samples with high correlation to real data are retained and evaluated through distribution analysis and classification tasks. A Random Forest model trained to distinguish synthetic from real EEG performs at chance level, indicating high fidelity. The generated synthetic data closely match the statistical and structural properties of the original EEG, with similar correlation coefficients and no significant differences in PERMANOVA tests. This method provides a scalable, privacy-preserving approach for augmenting EEG datasets, enabling more efficient model training in mental health research.

Related papers

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z)
Synthetic EEG Generation using Diffusion Models for Motor Imagery Tasks [0.0]
This study proposes a methodology for generating synthetic EEG signals associated with motor imagery brain tasks.<n>The approach involves preprocessing real EEG data, training a diffusion model to reconstruct EEG channels from noise, and evaluating the quality of the generated signals.<n>The generated data achieved classification accuracies above 95%, with low mean squared error and high correlation with real signals.
arXiv Detail & Related papers (2025-10-03T02:02:05Z)
Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI [0.6268282038459305]
We propose the use of adversarial random forests (ARF) as an efficient and convenient method for synthesizing epidemiological data.<n>We replicated statistical analyses from six epidemiological publications and compared original with synthetic results.<n>Results from multiple synthetic data replications consistently aligned with original findings.
arXiv Detail & Related papers (2025-08-19T22:51:40Z)
Graph-Convolutional-Beta-VAE for Synthetic Abdominal Aorta Aneurysm Generation [4.363232795241618]
This study presents a beta-Variational Autoencoder Graph Convolutional Neural Network framework for generating synthetic Abdominal Aorta Aneurysms (AAA)<n>Our approach extracts key anatomical features and captures complex statistical relationships within a compact disentangled latent space.<n>The resulting synthetic AAA dataset preserves patient privacy while providing a scalable foundation for medical research, device testing, and computational modeling.
arXiv Detail & Related papers (2025-06-16T15:55:56Z)
Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets.<n>Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z)
Synthetic Data Generation by Supervised Neural Gas Network for Physiological Emotion Recognition Data [0.0]
This study introduces an innovative approach to synthetic data generation using a Supervised Neural Gas (SNG) network.<n>The SNG efficiently processes the input data, creating synthetic instances that closely mimic the original data distributions.
arXiv Detail & Related papers (2025-01-19T15:34:05Z)
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data [78.70620682374624]
We introduce SynFER, a novel framework for synthesizing facial expression image data based on high-level textual descriptions.<n>To ensure the quality and reliability of the synthetic data, we propose a semantic guidance technique and a pseudo-label generator.<n>Results validate the efficacy of our approach and the synthetic data.
arXiv Detail & Related papers (2024-10-13T14:58:21Z)
Enhancing EEG Signal Generation through a Hybrid Approach Integrating Reinforcement Learning and Diffusion Models [6.102274021710727]
This study introduces an innovative approach to the synthesis of Electroencephalogram (EEG) signals by integrating diffusion models with reinforcement learning. Our methodology enhances the generation of EEG signals with detailed temporal and spectral features, enriching the authenticity and diversity of synthetic datasets.
arXiv Detail & Related papers (2024-09-14T07:22:31Z)
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs) Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws. Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z)
Boosting Data Analytics With Synthetic Volume Expansion [3.568650932986342]
This article explores the effectiveness of statistical methods on synthetic data and the privacy risks of synthetic data. A key finding within this framework is the generational effect, which reveals that the error rate of statistical methods on synthetic data decreases with the addition of more synthetic data but may eventually rise or stabilize.
arXiv Detail & Related papers (2023-10-27T01:57:27Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics [44.97217246897902]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing. We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z)
Synthesize High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model [40.473866438962034]
Synthetic electronic health records can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. We propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR.
arXiv Detail & Related papers (2023-04-04T23:53:34Z)
EEG Synthetic Data Generation Using Probabilistic Diffusion Models [0.0]
This study proposes an advanced methodology for data augmentation: generating synthetic EEG data using denoising diffusion probabilistic models. The synthetic data are generated from electrode-frequency distribution maps (EFDMs) of emotionally labeled EEG recordings. The proposed methodology has potential implications for the broader field of neuroscience research by enabling the creation of large, publicly available synthetic EEG datasets.
arXiv Detail & Related papers (2023-03-06T12:03:22Z)
Evaluation of the Synthetic Electronic Health Records [3.255030588361125]
This work outlines two metrics called Similarity and Uniqueness for sample-wise assessment of synthetic datasets. We demonstrate the proposed notions with several state-of-the-art generative models to synthesise Cystic Fibrosis (CF) patients' electronic health records.
arXiv Detail & Related papers (2022-10-16T22:46:08Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
GANSER: A Self-supervised Data Augmentation Framework for EEG-based Emotion Recognition [15.812231441367022]
We propose a novel data augmentation framework, namely Generative Adversarial Network-based Self-supervised Data Augmentation (GANSER) As the first to combine adversarial training with self-supervised learning for EEG-based emotion recognition, the proposed framework can generate high-quality simulated EEG samples. A transformation function is employed to mask parts of EEG signals and force the generator to synthesize potential EEG signals based on the remaining parts.
arXiv Detail & Related papers (2021-09-07T14:42:55Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.