Related papers: The Data Sharing Paradox of Synthetic Data in Healthcare

Related papers

Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework [34.56525983543448]
Synthetic data generation is gaining traction as a privacy enhancing technology.<n>The concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data.
arXiv Detail & Related papers (2025-12-18T08:09:28Z)
How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z)
Understanding the Influence of Synthetic Data for Text Embedders [52.04771455432998]
We first reproduce and publicly release the synthetic data proposed by Wang et al.<n>We critically examine where exactly synthetic data improves model generalization.<n>Our findings highlight the limitations of current synthetic data approaches for building general-purpose embedders.
arXiv Detail & Related papers (2025-09-07T19:28:52Z)
Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law [0.0]
Artificial intelligence (AI) has potential to transform healthcare, but it requires access to health data.<n>The study investigates whether synthetic data should be classified as personal data under the study.<n>The findings suggest synthetic data is likely anonymous, depending on certain factors, but highlights uncertainties about what constitutes reasonably likely risk.
arXiv Detail & Related papers (2025-08-11T17:59:06Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z)
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.<n>RAG systems may face severe privacy risks when retrieving private data.<n>We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
Practical considerations on using private sampling for synthetic data [1.3654846342364308]
Differential privacy for synthetic data generation has received much attention due to the ability of preserving privacy while freely using the synthetic data. Private sampling is the first noise-free method to construct differentially private synthetic data with rigorous bounds for privacy and accuracy. We provide an implementation of the private sampling algorithm and discuss the realism of its constraints in practical cases.
arXiv Detail & Related papers (2023-12-12T10:20:04Z)
The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development [0.6906005491572401]
This paper investigates the policies governing the creation, utilization, and dissemination of synthetic data. A well crafted synthetic data policy must strike a balance between privacy concerns and the utility of data.
arXiv Detail & Related papers (2023-08-31T23:18:53Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Synthetic Data: Methods, Use Cases, and Risks [11.413309528464632]
A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead. We provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
arXiv Detail & Related papers (2023-03-01T16:35:33Z)
Downstream Fairness Caveats with Synthetic Healthcare Data [21.54509987309669]
Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness.
arXiv Detail & Related papers (2022-03-09T00:52:47Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
Fidelity and Privacy of Synthetic Medical Data [0.0]
The digitization of medical records ushered in a new era of big data to clinical science. The need to share individual-level medical data continues to grow, and has never been more urgent. enthusiasm for the use of big data has been tempered by a fully appropriate concern for patient autonomy and privacy.
arXiv Detail & Related papers (2021-01-18T23:01:27Z)
Hide-and-Seek Privacy Challenge [88.49671206936259]
The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset.
arXiv Detail & Related papers (2020-07-23T15:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.