Related papers: Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy

Related papers

Leveraging Generative AI Through Prompt Engineering and Rigorous Validation to Create Comprehensive Synthetic Datasets for AI Training in Healthcare [0.0]
The GPT-4 API was employed to generate high-quality synthetic datasets aimed at overcoming this limitation. The generated data encompassed a comprehensive array of patient admission information, including healthcare provider details, hospital departments, wards, bed assignments, patient demographics, emergency contacts, vital signs, immunizations, allergies, medical histories, appointments, hospital visits, laboratory tests, diagnoses, treatment plans, medications, clinical notes, visit logs, discharge summaries, and referrals.
arXiv Detail & Related papers (2025-04-29T16:37:34Z)
A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs [1.1645633237702129]
We evaluate the current state of commercial Large Language Models for generating synthetic data. Our main finding is that while LLMs can reliably generate synthetic health records for smaller subsets of features, they struggle to preserve realistic distributions and correlations as the dimensionality of the data increases.
arXiv Detail & Related papers (2025-04-20T15:37:05Z)
Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities [61.633126163190724]
Mental illness is a widespread and debilitating condition with substantial societal and personal costs. Recent advances in Artificial Intelligence (AI) hold great potential for recognizing and addressing conditions such as depression, anxiety disorder, bipolar disorder, schizophrenia, and post-traumatic stress disorder. Privacy concerns, including the risk of sensitive data leakage from datasets and trained models, remain a critical barrier to deploying these AI systems in real-world clinical settings.
arXiv Detail & Related papers (2025-02-01T15:10:02Z)
A text-to-tabular approach to generate synthetic patient data using LLMs [0.3628457733531155]
We propose an approach to generate synthetic patient data that does not require access to the original data. We leverage prior medical knowledge and in-context learning capabilities of large language models to generate realistic patient data.
arXiv Detail & Related papers (2024-12-06T16:10:40Z)
Controllable Synthetic Clinical Note Generation with Privacy Guarantees [7.1366477372157995]
In this paper, we introduce a novel method to "clone" datasets containing Personal Health Information (PHI) Our approach ensures that the cloned datasets retain the essential characteristics and utility of the original data without compromising patient privacy. We conduct utility testing to evaluate the performance of machine learning models trained on the cloned datasets.
arXiv Detail & Related papers (2024-09-12T07:38:34Z)
NFDI4Health workflow and service for synthetic data generation, assessment and risk management [0.0]
A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data. In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health.
arXiv Detail & Related papers (2024-08-08T14:08:39Z)
Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision [2.7968600664591983]
This paper presents a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD. The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data. Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques.
arXiv Detail & Related papers (2024-07-12T05:43:13Z)
Synthetic Data in Radiological Imaging: Current State and Future Outlook [3.047958668050099]
Key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations. In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances.
arXiv Detail & Related papers (2024-05-08T18:35:47Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines [14.386260536090628]
We focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation. This enables us to generate patient sequences that can be seamlessly converted to the Observational Medical outcomes Partnership (OMOP) data format.
arXiv Detail & Related papers (2024-02-06T20:58:36Z)
Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z)
Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey [53.691704671844406]
The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare. The human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body. HDT is envisioned to empower IoT-healthcare beyond the application of healthcare monitoring by acting as a versatile and vivid human digital testbed. Recently, generative artificial intelligence (GAI) may be a promising solution because it can leverage advanced AI algorithms to automatically create, manipulate, and modify valuable while diverse data.
arXiv Detail & Related papers (2024-01-22T03:17:41Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging. We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets. We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.