Leveraging Generative AI Models for Synthetic Data Generation in
Healthcare: Balancing Research and Privacy
- URL: http://arxiv.org/abs/2305.05247v1
- Date: Tue, 9 May 2023 08:12:44 GMT
- Title: Leveraging Generative AI Models for Synthetic Data Generation in
Healthcare: Balancing Research and Privacy
- Authors: Aryan Jadon, Shashank Kumar
- Abstract summary: generative AI models like GANs and VAEs offer a promising solution to balance valuable data access and patient privacy protection.
In this paper, we examine generative AI models for creating realistic, anonymized patient data for research and training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread adoption of electronic health records and digital healthcare
data has created a demand for data-driven insights to enhance patient outcomes,
diagnostics, and treatments. However, using real patient data presents privacy
and regulatory challenges, including compliance with HIPAA and GDPR. Synthetic
data generation, using generative AI models like GANs and VAEs offers a
promising solution to balance valuable data access and patient privacy
protection. In this paper, we examine generative AI models for creating
realistic, anonymized patient data for research and training, explore synthetic
data applications in healthcare, and discuss its benefits, challenges, and
future research directions. Synthetic data has the potential to revolutionize
healthcare by providing anonymized patient data while preserving privacy and
enabling versatile applications.
Related papers
- Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision [2.7968600664591983]
This paper presents a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD.
The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data.
Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques.
arXiv Detail & Related papers (2024-07-12T05:43:13Z) - Synthetic Data in Radiological Imaging: Current State and Future Outlook [3.047958668050099]
Key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations.
In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances.
arXiv Detail & Related papers (2024-05-08T18:35:47Z) - Best Practices and Lessons Learned on Synthetic Data for Language Models [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines [14.386260536090628]
We focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation.
This enables us to generate patient sequences that can be seamlessly converted to the Observational Medical outcomes Partnership (OMOP) data format.
arXiv Detail & Related papers (2024-02-06T20:58:36Z) - Recent Advances in Predictive Modeling with Electronic Health Records [73.31880579203012]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey [53.691704671844406]
The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare.
The human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body.
HDT is envisioned to empower IoT-healthcare beyond the application of healthcare monitoring by acting as a versatile and vivid human digital testbed.
Recently, generative artificial intelligence (GAI) may be a promising solution because it can leverage advanced AI algorithms to automatically create, manipulate, and modify valuable while diverse data.
arXiv Detail & Related papers (2024-01-22T03:17:41Z) - Data-Centric Foundation Models in Computational Healthcare: A Survey [22.459507690070463]
Foundation models (FMs) as an emerging suite of AI techniques have struck a wave of opportunities in computational healthcare.
We discuss key perspectives in AI security, assessment, and alignment with human values.
We offer a promising outlook of FM-based analytics to enhance the performance of patient outcome and clinical workflow.
arXiv Detail & Related papers (2024-01-04T08:00:32Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications [9.374416143268892]
generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
arXiv Detail & Related papers (2021-12-22T17:17:34Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.