Related papers: An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set

An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set

URL: http://arxiv.org/abs/2407.04084v1
Date: Thu, 4 Jul 2024 17:50:55 GMT
Title: An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set
Authors: Jonas Allmann, Saskia Nuñez von Voigt, Florian Tschorsch,
Abstract summary: We show how to compute the risk metric efficiently for a set of basic statistical queries. Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions.
Score: 2.66269503676104
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Using real-world study data usually requires contractual agreements where research results may only be published in anonymized form. Requiring formal privacy guarantees, such as differential privacy, could be helpful for data-driven projects to comply with data protection. However, deploying differential privacy in consumer use cases raises the need to explain its underlying mechanisms and the resulting privacy guarantees. In this paper, we thoroughly review and extend an existing privacy metric. We show how to compute this risk metric efficiently for a set of basic statistical queries. Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions, while presenting more challenges than solutions.

Related papers

Differential Privacy in Machine Learning: From Symbolic AI to LLMs [49.1574468325115]
Differential privacy provides a formal framework to mitigate privacy risks.<n>It ensures that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm.
arXiv Detail & Related papers (2025-06-13T11:30:35Z)
On Privately Estimating a Single Parameter [47.499748486548484]
We investigate differentially private estimators for individual parameters within larger parametric models. By leveraging these private certificates, we provide computationally and statistical efficient mechanisms that release private statistics that are, at least in the sample size, essentially unimprovable. We investigate the practicality of the algorithms both in simulated data and in real-world data from the American Community Survey and US Census, highlighting scenarios in which the new procedures are successful and identifying areas for future work.
arXiv Detail & Related papers (2025-03-21T15:57:12Z)
Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [14.939460540040459]
We introduce demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy. Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process. We prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
arXiv Detail & Related papers (2025-02-04T20:42:30Z)
Tabular Data Synthesis with Differential Privacy: A Survey [24.500349285858597]
Data sharing is a prerequisite for collaborative innovation, enabling organizations to leverage diverse datasets for deeper insights. Data synthesis tackles this by generating artificial datasets that preserve the statistical characteristics of real data. Differentially private data synthesis has emerged as a promising approach to privacy-aware data sharing.
arXiv Detail & Related papers (2024-11-04T06:32:48Z)
Private Estimation when Data and Privacy Demands are Correlated [5.755004576310333]
Differential Privacy is the current gold-standard for ensuring privacy for statistical queries. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error.
arXiv Detail & Related papers (2024-07-15T22:46:02Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings. Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z)
A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z)
Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z)
An In-Depth Examination of Requirements for Disclosure Risk Assessment [6.0631983658449435]
We argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most.
arXiv Detail & Related papers (2023-10-13T20:36:29Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Towards Generalizable Data Protection With Transferable Unlearnable Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples. To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z)
Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge. Existing private generative models are struggling with the utility of synthetic samples. We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data. FL and related techniques are often described as privacy-preserving. We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
Data-driven Regularized Inference Privacy [33.71757542373714]
We propose a data-driven inference privacy preserving framework to sanitize data. We develop an inference privacy framework based on the variational method. We present empirical methods to estimate the privacy metric.
arXiv Detail & Related papers (2020-10-10T08:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.