Related papers: NFDI4Health workflow and service for synthetic data generation, assessment and risk management

NFDI4Health workflow and service for synthetic data generation, assessment and risk management

URL: http://arxiv.org/abs/2408.04478v1
Date: Thu, 8 Aug 2024 14:08:39 GMT
Title: NFDI4Health workflow and service for synthetic data generation, assessment and risk management
Authors: Sobhan Moazemi, Tim Adams, Hwei Geok NG, Lisa Kühnel, Julian Schneider, Anatol-Fiete Näher, Juliane Fluck, Holger Fröhlich,
Abstract summary: A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data. In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Individual health data is crucial for scientific advancements, particularly in developing Artificial Intelligence (AI); however, sharing real patient information is often restricted due to privacy concerns. A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data, while preserving confidential patient information. In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health. First, two state-of-the-art AI tools (namely, VAMBN and MultiNODEs) for generating synthetic health data are outlined. Further, we introduce SYNDAT (a public web-based tool) which allows users to visualize and assess the quality and risk of synthetic data provided by desired generative models. Additionally, the utility of the proposed methods and the web-based tool is showcased using data from Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Center for Cancer Registry Data of the Robert Koch Institute (RKI).

Related papers

An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval [51.10419281315848]
We conduct an empirical study to explore the potential of synthetic data for Text-Based Person Retrieval (TBPR) research. We propose an inter-class image generation pipeline, in which an automatic prompt construction strategy is introduced. We develop an intra-class image augmentation pipeline, in which the generative AI models are applied to further edit the images.
arXiv Detail & Related papers (2025-03-28T06:18:15Z)
A text-to-tabular approach to generate synthetic patient data using LLMs [0.3628457733531155]
We propose an approach to generate synthetic patient data that does not require access to the original data. We leverage prior medical knowledge and in-context learning capabilities of large language models to generate realistic patient data.
arXiv Detail & Related papers (2024-12-06T16:10:40Z)
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. RAG systems may face severe privacy risks when retrieving private data. We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z)
Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z)
Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing [74.58071278710896]
generative AI has attracted much attention from both academic and industrial fields. Secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement.
arXiv Detail & Related papers (2024-05-17T04:00:58Z)
Synthetic Data in Radiological Imaging: Current State and Future Outlook [3.047958668050099]
Key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations. In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances.
arXiv Detail & Related papers (2024-05-08T18:35:47Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy [0.0]
generative AI models like GANs and VAEs offer a promising solution to balance valuable data access and patient privacy protection. In this paper, we examine generative AI models for creating realistic, anonymized patient data for research and training.
arXiv Detail & Related papers (2023-05-09T08:12:44Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey [6.277848092408045]
Data quality is the key factor for the development of trustworthy AI in healthcare. Access to good quality datasets is limited by the technical difficulty of data acquisition. Large-scale sharing of healthcare data is hindered by strict ethical restrictions.
arXiv Detail & Related papers (2022-09-17T13:34:17Z)
Enabling Synthetic Data adoption in regulated domains [1.9512796489908306]
The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties.
arXiv Detail & Related papers (2022-04-13T10:53:54Z)
Fidelity and Privacy of Synthetic Medical Data [0.0]
The digitization of medical records ushered in a new era of big data to clinical science. The need to share individual-level medical data continues to grow, and has never been more urgent. enthusiasm for the use of big data has been tempered by a fully appropriate concern for patient autonomy and privacy.
arXiv Detail & Related papers (2021-01-18T23:01:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.