Related papers: Differentially Private Synthetic Medical Data Generation using Convolutional GANs

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

URL: http://arxiv.org/abs/2012.11774v1
Date: Tue, 22 Dec 2020 01:03:49 GMT
Title: Differentially Private Synthetic Medical Data Generation using Convolutional GANs
Authors: Amirsina Torfi and Edward A. Fox and Chandan K. Reddy
Abstract summary: We develop a differentially private framework for synthetic data generation using R'enyi differential privacy. Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data. We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
Score: 7.2372051099165065
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep learning models have demonstrated superior performance in several application problems, such as image classification and speech processing. However, creating a deep learning model using health record data requires addressing certain privacy challenges that bring unique concerns to researchers working in this domain. One effective way to handle such private data issues is to generate realistic synthetic data that can provide practically acceptable data quality and correspondingly the model performance. To tackle this challenge, we develop a differentially private framework for synthetic data generation using R\'enyi differential privacy. Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data. In addition, our model can also capture the temporal information and feature correlations that might be present in the original data. We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget using several publicly available benchmark medical datasets in both supervised and unsupervised settings.

Related papers

Privacy-Preserving Model Transcription with Differentially Private Synthetic Distillation [67.76456940243294]
Deep learning models trained on private datasets may pose a privacy leakage risk.<n>We present emphprivacy-preserving model transcription, a data-free model-to-model conversion solution.
arXiv Detail & Related papers (2026-01-27T01:51:35Z)
Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS) [33.032422801043495]
We propose a method for generating privacy-preserving high-quality synthetic data using Matrix Product States (MPS)<n>We benchmark the MPS-based generative model against state-of-the-art models such as CTGAN, VAE, and PrivBayes.<n>Our results show that MPS outperforms classical models, particularly under strict privacy constraints.
arXiv Detail & Related papers (2025-08-08T12:14:57Z)
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z)
KIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation [0.0]
Generative Deep Learning models struggle to model discrete and non-Gaussian features that have domain constraints. Generative models create synthetic data that repeats sensitive features, which is a privacy risk. This paper proposes a novel model, KIPPS, that infuses Domain and Regulatory Knowledge from Knowledge Graphs into Generative Deep Learning models for enhanced Privacy Preserving Synthetic data generation.
arXiv Detail & Related papers (2024-09-25T19:50:03Z)
Privacy-preserving datasets by capturing feature distributions with Conditional VAEs [0.11999555634662634]
Conditional Variational Autoencoders (CVAEs) trained on feature vectors extracted from large pre-trained vision foundation models. Our method notably outperforms traditional approaches in both medical and natural image domains. Results underscore the potential of generative models to significantly impact deep learning applications in data-scarce and privacy-sensitive environments.
arXiv Detail & Related papers (2024-08-01T15:26:24Z)
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data [10.217822818544475]
We propose a framework to generate synthetic (tabular) data powered by large language models (LLMs) Our approach significantly enhance the quality of synthetic data generation in common scenarios with small sample sizes. Our results demonstrate that our model outperforms several state-of-art models regarding generating higher quality synthetic data for downstream tasks while keeping privacy of the real data.
arXiv Detail & Related papers (2024-06-15T06:26:17Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records [1.9749268648715583]
This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation.
arXiv Detail & Related papers (2024-02-21T10:24:34Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Towards Personalized Federated Learning via Heterogeneous Model Reassembly [84.44268421053043]
pFedHR is a framework that leverages heterogeneous model reassembly to achieve personalized federated learning. pFedHR dynamically generates diverse personalized models in an automated manner.
arXiv Detail & Related papers (2023-08-16T19:36:01Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge. Existing private generative models are struggling with the utility of synthetic samples. We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
PEARL: Data Synthesis via Private Embeddings and Adversarial Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner. Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion. Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.