Generative clinical time series models trained on moderate amounts of patient data are privacy preserving
- URL: http://arxiv.org/abs/2602.10631v1
- Date: Wed, 11 Feb 2026 08:23:54 GMT
- Title: Generative clinical time series models trained on moderate amounts of patient data are privacy preserving
- Authors: Rustam Zhumagambetov, Niklas Giesa, Sebastian D. Boie, Stefan Haufe,
- Abstract summary: We use a battery of privacy attacks to audit state-of-the-art hospital time series models, trained on the public MIMIC-IV dataset.<n>Results show that established privacy attacks are ineffective against generated multivariate clinical time series when synthetic data generators are trained on large enough datasets.
- Score: 1.7728232380247864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharing medical data for machine learning model training purposes is often impossible due to the risk of disclosing identifying information about individual patients. Synthetic data produced by generative artificial intelligence (genAI) models trained on real data is often seen as one possible solution to comply with privacy regulations. While powerful genAI models for heterogeneous hospital time series have recently been introduced, such modeling does not guarantee privacy protection, as the generated data may still reveal identifying information about individuals in the models' training cohort. Applying established privacy mechanisms to generative time series models, however, proves challenging as post-hoc data anonymization through k-anonymization or similar techniques is limited, while model-centered privacy mechanisms that implement differential privacy (DP) may lead to unstable training, compromising the utility of generated data. Given these known limitations, privacy audits for generative time series models are currently indispensable regardless of the concrete privacy mechanisms applied to models and/or data. In this work, we use a battery of established privacy attacks to audit state-of-the-art hospital time series models, trained on the public MIMIC-IV dataset, with respect to privacy preservation. Furthermore, the eICU dataset was used to mount a privacy attack against the synthetic data generator trained on the MIMIC-IV dataset. Results show that established privacy attacks are ineffective against generated multivariate clinical time series when synthetic data generators are trained on large enough training datasets. Furthermore, we discuss how the use of existing DP mechanisms for these synthetic data generators would not bring desired improvement in privacy, but only a decrease in utility for machine learning prediction tasks.
Related papers
- Rethinking Anonymity Claims in Synthetic Data Generation: A Model-Centric Privacy Attack Perspective [18.404146545866812]
Training generative machine learning models to produce synthetic data has become a popular approach for enhancing privacy in data sharing.<n>As this typically involves processing sensitive personal information, releasing either the trained model or generated synthetic anonymity can still pose privacy risks.<n>We argue that meaningful assessments must account for the capabilities and properties of underlying generative model and be grounded in state-of-the-art privacy attacks.
arXiv Detail & Related papers (2026-01-30T00:57:41Z) - Privacy-Preserving Model Transcription with Differentially Private Synthetic Distillation [67.76456940243294]
Deep learning models trained on private datasets may pose a privacy leakage risk.<n>We present emphprivacy-preserving model transcription, a data-free model-to-model conversion solution.
arXiv Detail & Related papers (2026-01-27T01:51:35Z) - Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease [1.334430331852034]
We enhance a state-of-the-art time-series generative model to better handle longitudinal clinical data while incorporating quantifiable privacy safeguards.<n>Our non-private model (Augmented TimeGAN) outperforms transformer- and flow-based models on statistical metrics in several datasets.<n>Our private model (DP-TimeGAN) maintains a mean authenticity of 0.778 on the CKD dataset, outperforming existing state-of-the-art models on the privacy-utility frontier.
arXiv Detail & Related papers (2025-11-29T10:16:14Z) - Controllable Synthetic Clinical Note Generation with Privacy Guarantees [7.1366477372157995]
In this paper, we introduce a novel method to "clone" datasets containing Personal Health Information (PHI)
Our approach ensures that the cloned datasets retain the essential characteristics and utility of the original data without compromising patient privacy.
We conduct utility testing to evaluate the performance of machine learning models trained on the cloned datasets.
arXiv Detail & Related papers (2024-09-12T07:38:34Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Differentially Private Synthetic Data Generation via
Lipschitz-Regularised Variational Autoencoders [3.7463972693041274]
It is often overlooked that generative models are prone to memorising many details of individual training records.
In this paper we explore an alternative approach for privately generating data that makes direct use of the inherentity in generative models.
arXiv Detail & Related papers (2023-04-22T07:24:56Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially Private Synthetic Medical Data Generation using
Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy.
Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data.
We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z) - Hide-and-Seek Privacy Challenge [88.49671206936259]
The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to accelerate progress in tackling both problems.
In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset.
arXiv Detail & Related papers (2020-07-23T15:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.