Leveraging Generative AI Models for Synthetic Data Generation in
Healthcare: Balancing Research and Privacy
- URL: http://arxiv.org/abs/2305.05247v1
- Date: Tue, 9 May 2023 08:12:44 GMT
- Title: Leveraging Generative AI Models for Synthetic Data Generation in
Healthcare: Balancing Research and Privacy
- Authors: Aryan Jadon, Shashank Kumar
- Abstract summary: generative AI models like GANs and VAEs offer a promising solution to balance valuable data access and patient privacy protection.
In this paper, we examine generative AI models for creating realistic, anonymized patient data for research and training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread adoption of electronic health records and digital healthcare
data has created a demand for data-driven insights to enhance patient outcomes,
diagnostics, and treatments. However, using real patient data presents privacy
and regulatory challenges, including compliance with HIPAA and GDPR. Synthetic
data generation, using generative AI models like GANs and VAEs offers a
promising solution to balance valuable data access and patient privacy
protection. In this paper, we examine generative AI models for creating
realistic, anonymized patient data for research and training, explore synthetic
data applications in healthcare, and discuss its benefits, challenges, and
future research directions. Synthetic data has the potential to revolutionize
healthcare by providing anonymized patient data while preserving privacy and
enabling versatile applications.
Related papers
- Controllable Synthetic Clinical Note Generation with Privacy Guarantees [7.1366477372157995]
In this paper, we introduce a novel method to "clone" datasets containing Personal Health Information (PHI)
Our approach ensures that the cloned datasets retain the essential characteristics and utility of the original data without compromising patient privacy.
We conduct utility testing to evaluate the performance of machine learning models trained on the cloned datasets.
arXiv Detail & Related papers (2024-09-12T07:38:34Z) - NFDI4Health workflow and service for synthetic data generation, assessment and risk management [0.0]
A promising solution to this challenge is synthetic data generation.
This technique creates entirely new datasets that mimic the statistical properties of real data.
In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health.
arXiv Detail & Related papers (2024-08-08T14:08:39Z) - Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision [2.7968600664591983]
This paper presents a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD.
The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data.
Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques.
arXiv Detail & Related papers (2024-07-12T05:43:13Z) - Synthetic Data in Radiological Imaging: Current State and Future Outlook [3.047958668050099]
Key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations.
In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances.
arXiv Detail & Related papers (2024-05-08T18:35:47Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines [14.386260536090628]
We focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation.
This enables us to generate patient sequences that can be seamlessly converted to the Observational Medical outcomes Partnership (OMOP) data format.
arXiv Detail & Related papers (2024-02-06T20:58:36Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey [53.691704671844406]
The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare.
The human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body.
HDT is envisioned to empower IoT-healthcare beyond the application of healthcare monitoring by acting as a versatile and vivid human digital testbed.
Recently, generative artificial intelligence (GAI) may be a promising solution because it can leverage advanced AI algorithms to automatically create, manipulate, and modify valuable while diverse data.
arXiv Detail & Related papers (2024-01-22T03:17:41Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.