A Summary of Privacy-Preserving Data Publishing in the Local Setting
- URL: http://arxiv.org/abs/2312.11845v1
- Date: Tue, 19 Dec 2023 04:23:23 GMT
- Title: A Summary of Privacy-Preserving Data Publishing in the Local Setting
- Authors: Wenjun Lin, Jiahao Qian, Wenwen Liu, Lang Wu,
- Abstract summary: Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
- Score: 0.6749750044497732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The exponential growth of collected, processed, and shared data has given rise to concerns about individuals' privacy. Consequently, various laws and regulations have been established to oversee how organizations handle and safeguard data. One such method is Statistical Disclosure Control, which aims to minimize the risk of exposing confidential information by de-identifying it. This de-identification is achieved through specific privacy-preserving techniques. However, a trade-off exists: de-identified data can often lead to a loss of information, which might impact the accuracy of data analysis and the predictive capability of models. The overarching goal remains to safeguard individual privacy while preserving the data's interpretability, meaning its overall usefulness. Despite advances in Statistical Disclosure Control, the field continues to evolve, with no definitive solution that strikes an optimal balance between privacy and utility. This survey delves into the intricate processes of de-identification. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance. Herein, we tackle the primary challenges posed by privacy constraints, overview predominant strategies to mitigate these challenges, categorize privacy-preserving techniques, offer a theoretical assessment of current comparative research, and highlight numerous unresolved issues in the domain.
Related papers
- Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z) - Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing [3.7274308010465775]
We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing.
We measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets.
arXiv Detail & Related papers (2024-05-22T16:30:34Z) - $\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy
Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure.
We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Towards a Data Privacy-Predictive Performance Trade-off [2.580765958706854]
We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks.
Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
arXiv Detail & Related papers (2022-01-13T21:48:51Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Deep Directed Information-Based Learning for Privacy-Preserving Smart
Meter Data Release [30.409342804445306]
We study the problem in the context of time series data and smart meters (SMs) power consumption measurements.
We introduce the Directed Information (DI) as a more meaningful measure of privacy in the considered setting.
Our empirical studies on real-world data sets from SMs measurements in the worst-case scenario show the existing trade-offs between privacy and utility.
arXiv Detail & Related papers (2020-11-20T13:41:11Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - On the Privacy-Utility Tradeoff in Peer-Review Data Analysis [34.0435377376779]
A major impediment to research on improving peer review is the unavailability of peer-review data.
We propose a framework for privacy-preserving release of certain conference peer-review data.
arXiv Detail & Related papers (2020-06-29T21:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.