Related papers: Synthetic Data in Healthcare

Synthetic Data in Healthcare

URL: http://arxiv.org/abs/2304.03243v1
Date: Thu, 6 Apr 2023 17:23:39 GMT
Title: Synthetic Data in Healthcare
Authors: Daniel McDuff, Theodore Curran, Achuta Kadambi
Abstract summary: We present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine. We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.
Score: 10.555189948915492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Synthetic data are becoming a critical tool for building artificially intelligent systems. Simulators provide a way of generating data systematically and at scale. These data can then be used either exclusively, or in conjunction with real data, for training and testing systems. Synthetic data are particularly attractive in cases where the availability of ``real'' training examples might be a bottleneck. While the volume of data in healthcare is growing exponentially, creating datasets for novel tasks and/or that reflect a diverse set of conditions and causal relationships is not trivial. Furthermore, these data are highly sensitive and often patient specific. Recent research has begun to illustrate the potential for synthetic data in many areas of medicine, but no systematic review of the literature exists. In this paper, we present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine. We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.

Related papers

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results [0.32593385688760446]
In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data. In this study, we use a state-of-the-art synthetic data generation method.
arXiv Detail & Related papers (2023-05-12T13:13:55Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Synthetic Data: Methods, Use Cases, and Risks [11.413309528464632]
A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. We provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
arXiv Detail & Related papers (2023-03-01T16:35:33Z)
Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society. Risk is that they will systematically spread the bias embedded in data. We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
FakeNews: GAN-based generation of realistic 3D volumetric data -- A systematic review and taxonomy [2.801317303396674]
Generative Adversarial Networks (GANs) are used to generate realistic synthetic data. In this review, we provide a summary of works that generate realistic volumetric synthetic data using GANs.
arXiv Detail & Related papers (2022-07-04T13:14:37Z)
The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms [2.032684842401705]
Health Gym is a collection of synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms. The datasets were created using a novel generative adversarial network (GAN) The risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.
arXiv Detail & Related papers (2022-03-12T07:28:02Z)
Measuring Utility and Privacy of Synthetic Genomic Data [3.635321290763711]
We provide the first evaluation of the utility and the privacy protection of five state-of-the-art models for generating synthetic genomic data. Overall, there is no single approach for generating synthetic genomic data that performs well across the board.
arXiv Detail & Related papers (2021-02-05T17:41:01Z)
Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data. Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community. Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.