Related papers: A Summary of Privacy-Preserving Data Publishing in the Local Setting

A Summary of Privacy-Preserving Data Publishing in the Local Setting

URL: http://arxiv.org/abs/2312.11845v1
Date: Tue, 19 Dec 2023 04:23:23 GMT
Title: A Summary of Privacy-Preserving Data Publishing in the Local Setting
Authors: Wenjun Lin, Jiahao Qian, Wenwen Liu, Lang Wu,
Abstract summary: Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
Score: 0.6749750044497732
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The exponential growth of collected, processed, and shared data has given rise to concerns about individuals' privacy. Consequently, various laws and regulations have been established to oversee how organizations handle and safeguard data. One such method is Statistical Disclosure Control, which aims to minimize the risk of exposing confidential information by de-identifying it. This de-identification is achieved through specific privacy-preserving techniques. However, a trade-off exists: de-identified data can often lead to a loss of information, which might impact the accuracy of data analysis and the predictive capability of models. The overarching goal remains to safeguard individual privacy while preserving the data's interpretability, meaning its overall usefulness. Despite advances in Statistical Disclosure Control, the field continues to evolve, with no definitive solution that strikes an optimal balance between privacy and utility. This survey delves into the intricate processes of de-identification. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance. Herein, we tackle the primary challenges posed by privacy constraints, overview predominant strategies to mitigate these challenges, categorize privacy-preserving techniques, offer a theoretical assessment of current comparative research, and highlight numerous unresolved issues in the domain.

Related papers

A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis [11.135689359531105]
This paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection.<n>The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details.
arXiv Detail & Related papers (2025-07-23T04:05:33Z)
Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [14.939460540040459]
We introduce demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy. Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process. We prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
arXiv Detail & Related papers (2025-02-04T20:42:30Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained. It has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z)
Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing [3.7274308010465775]
We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing. We measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets.
arXiv Detail & Related papers (2024-05-22T16:30:34Z)
$\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure. We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Towards a Data Privacy-Predictive Performance Trade-off [2.580765958706854]
We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks. Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
arXiv Detail & Related papers (2022-01-13T21:48:51Z)
Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data. FL and related techniques are often described as privacy-preserving. We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z)
Deep Directed Information-Based Learning for Privacy-Preserving Smart Meter Data Release [30.409342804445306]
We study the problem in the context of time series data and smart meters (SMs) power consumption measurements. We introduce the Directed Information (DI) as a more meaningful measure of privacy in the considered setting. Our empirical studies on real-world data sets from SMs measurements in the worst-case scenario show the existing trade-offs between privacy and utility.
arXiv Detail & Related papers (2020-11-20T13:41:11Z)
Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data. perturbations chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
On the Privacy-Utility Tradeoff in Peer-Review Data Analysis [34.0435377376779]
A major impediment to research on improving peer review is the unavailability of peer-review data. We propose a framework for privacy-preserving release of certain conference peer-review data.
arXiv Detail & Related papers (2020-06-29T21:08:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.