Hide-and-Seek Privacy Challenge
- URL: http://arxiv.org/abs/2007.12087v2
- Date: Fri, 24 Jul 2020 17:10:18 GMT
- Title: Hide-and-Seek Privacy Challenge
- Authors: James Jordon, Daniel Jarrett, Jinsung Yoon, Tavian Barnes, Paul
Elbers, Patrick Thoral, Ari Ercole, Cheng Zhang, Danielle Belgrave and
Mihaela van der Schaar
- Abstract summary: The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to accelerate progress in tackling both problems.
In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset.
- Score: 88.49671206936259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The clinical time-series setting poses a unique combination of challenges to
data modeling and sharing. Due to the high dimensionality of clinical time
series, adequate de-identification to preserve privacy while retaining data
utility is difficult to achieve using common de-identification techniques. An
innovative approach to this problem is synthetic data generation. From a
technical perspective, a good generative model for time-series data should
preserve temporal dynamics, in the sense that new sequences respect the
original relationships between high-dimensional variables across time. From the
privacy perspective, the model should prevent patient re-identification by
limiting vulnerability to membership inference attacks. The NeurIPS 2020
Hide-and-Seek Privacy Challenge is a novel two-tracked competition to
simultaneously accelerate progress in tackling both problems. In our
head-to-head format, participants in the synthetic data generation track (i.e.
"hiders") and the patient re-identification track (i.e. "seekers") are directly
pitted against each other by way of a new, high-quality intensive care
time-series dataset: the AmsterdamUMCdb dataset. Ultimately, we seek to advance
generative techniques for dense and high-dimensional temporal data streams that
are (1) clinically meaningful in terms of fidelity and predictivity, as well as
(2) capable of minimizing membership privacy risks in terms of the concrete
notion of patient re-identification.
Related papers
- SeqRisk: Transformer-augmented latent variable model for improved survival prediction with longitudinal data [4.1476925904032464]
We propose SeqRisk, a method that combines variational autoencoder (VAE) or longitudinal VAE (LVAE) with a transformer encoder and Cox proportional hazards module for risk prediction.
We demonstrate that SeqRisk performs competitively compared to existing approaches on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-09-19T12:35:25Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data [12.30620268528346]
We propose a new framework termed Abstention-Aware Federated Voting (AAFV)
AAFV can collaboratively and confidentially train heterogeneous local models while simultaneously protecting the data privacy.
In particular, the proposed abstention-aware voting mechanism exploits a threshold-based abstention method to select high-confidence votes from heterogeneous local models.
arXiv Detail & Related papers (2024-06-15T08:43:40Z) - Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data [104.45155847778584]
This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn)
FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations.
arXiv Detail & Related papers (2024-04-16T08:15:10Z) - Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records [1.338174941551702]
This study assesses the capability of the Llama 2 LLM to create synthetic medical records that accurately reflect real patient information.
We focus on generating synthetic narratives for the History of Present Illness section, utilising data from the MIMIC-IV dataset for comparison.
Our findings suggest that this chain-of-thought prompted approach allows the zero-shot model to achieve results on par with those of fine-tuned models, based on Rouge metrics evaluation.
arXiv Detail & Related papers (2024-03-13T16:17:09Z) - Protect and Extend -- Using GANs for Synthetic Data Generation of
Time-Series Medical Records [1.9749268648715583]
This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients.
Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation.
arXiv Detail & Related papers (2024-02-21T10:24:34Z) - T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in
Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data.
We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z) - Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control [3.8811062755861956]
$epsilon$-PrivateSMOTE is a technique for safeguarding against re-identification and linkage attacks.
Our proposal combines synthetic data generation via noise-induced adversarial with differential privacy principles to obfuscate high-risk cases.
arXiv Detail & Related papers (2022-12-01T13:20:37Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.