Forecasting-Based Biomedical Time-series Data Synthesis for Open Data and Robust AI
- URL: http://arxiv.org/abs/2510.04622v1
- Date: Mon, 06 Oct 2025 09:32:10 GMT
- Title: Forecasting-Based Biomedical Time-series Data Synthesis for Open Data and Robust AI
- Authors: Youngjoon Lee, Seongmin Cho, Yehhyun Jo, Jinu Gong, Hyunjoo Jenny Lee, Joonhyuk Kang,
- Abstract summary: We propose a framework for synthetic biomedical time-series data generation based on advanced forecasting models.<n>These synthetic datasets preserve essential temporal and spectral properties of real data.
- Score: 0.841508985473488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The limited data availability due to strict privacy regulations and significant resource demands severely constrains biomedical time-series AI development, which creates a critical gap between data requirements and accessibility. Synthetic data generation presents a promising solution by producing artificial datasets that maintain the statistical properties of real biomedical time-series data without compromising patient confidentiality. We propose a framework for synthetic biomedical time-series data generation based on advanced forecasting models that accurately replicates complex electrophysiological signals such as EEG and EMG with high fidelity. These synthetic datasets preserve essential temporal and spectral properties of real data, which enables robust analysis while effectively addressing data scarcity and privacy challenges. Our evaluations across multiple subjects demonstrate that the generated synthetic data can serve as an effective substitute for real data and also significantly boost AI model performance. The approach maintains critical biomedical features while provides high scalability for various applications and integrates seamlessly into open-source repositories, substantially expanding resources for AI-driven biomedical research.
Related papers
- A Reinforcement Learning Approach to Synthetic Data Generation [8.293402602656736]
We introduce RLSyn, a novel framework that models the data generator as a policy over patient records.<n>We benchmark it against state-of-the-art generative adversarial networks (GANs) and diffusion-based methods across privacy, utility, and fidelity evaluations.
arXiv Detail & Related papers (2025-12-24T19:26:37Z) - Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z) - Understanding the Influence of Synthetic Data for Text Embedders [52.04771455432998]
We first reproduce and publicly release the synthetic data proposed by Wang et al.<n>We critically examine where exactly synthetic data improves model generalization.<n>Our findings highlight the limitations of current synthetic data approaches for building general-purpose embedders.
arXiv Detail & Related papers (2025-09-07T19:28:52Z) - Using Imperfect Synthetic Data in Downstream Inference Tasks [50.40949503799331]
We introduce a new estimator based on generalized method of moments.<n>We find that interactions between the moment residuals of synthetic data and those of real data can improve estimates of the target parameter.
arXiv Detail & Related papers (2025-08-08T18:32:52Z) - A text-to-tabular approach to generate synthetic patient data using LLMs [0.3628457733531155]
We propose an approach to generate synthetic patient data that does not require access to the original data.<n>We leverage prior medical knowledge and in-context learning capabilities of large language models to generate realistic patient data.
arXiv Detail & Related papers (2024-12-06T16:10:40Z) - Synthetic Data in Radiological Imaging: Current State and Future Outlook [3.047958668050099]
Key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations.
In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances.
arXiv Detail & Related papers (2024-05-08T18:35:47Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Synthetic data generation for a longitudinal cohort study -- Evaluation,
method extension and reproduction of published data analysis results [0.32593385688760446]
In the health sector, access to individual-level data is often challenging due to privacy concerns.
A promising alternative is the generation of fully synthetic data.
In this study, we use a state-of-the-art synthetic data generation method.
arXiv Detail & Related papers (2023-05-12T13:13:55Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Generating synthetic multi-dimensional molecular-mediator time series
data for artificial intelligence-based disease trajectory forecasting and
drug development digital twins: Considerations [0.0]
The use of synthetic data is recognized as a crucial step in the development of neural network-based Artificial Intelligence (AI) systems.
Insufficiency of statistical and data-centric machine learning means of generating this type of synthetic data is due to a combination of factors.
The generation of synthetic data that accounts for the identified factors of multi-dimensional time series data is an essential capability for the development of mediator-biomarker based AI forecasting systems.
arXiv Detail & Related papers (2023-03-16T03:13:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.