Why Data Anonymization Has Not Taken Off
- URL: http://arxiv.org/abs/2509.10165v2
- Date: Sat, 20 Sep 2025 11:44:07 GMT
- Title: Why Data Anonymization Has Not Taken Off
- Authors: Matthew J. Schneider, James Bailie, Dawn Iacobucci,
- Abstract summary: Data anonymization taken off in practice because it is anything but simple to implement.<n>Each variation of privacy meaning, as well as the practical implications, guarantee different choices.<n>Some data anonymization methods can be effective when only the insights required are much larger than the unit of protection.<n>Companies should not expect easy wins, but rather recognize that anonymization is just one approach to data privacy with its own best strategies.
- Score: 0.688204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Companies are looking to data anonymization research $\unicode{x2013}$ including differential private and synthetic data methods $\unicode{x2013}$ for simple and straightforward compliance solutions. But data anonymization has not taken off in practice because it is anything but simple to implement. For one, it requires making complex choices which are case dependent, such as the domain of the dataset to anonymize; the units to protect; the scope where the data protection should extend to; and the standard of protection. Each variation of these choices changes the very meaning, as well as the practical implications, of differential privacy (or of any other measure of data anonymization). Yet differential privacy is frequently being branded as the same privacy guarantee regardless of variations in these choices. Some data anonymization methods can be effective, but only when the insights required are much larger than the unit of protection. Given that businesses care about profitability, any solution must preserve the patterns between a firm's data and that profitability. As a result, data anonymization solutions usually need to be bespoke and case-specific, which reduces their scalability. Companies should not expect easy wins, but rather recognize that anonymization is just one approach to data privacy with its own particular advantages and drawbacks, while the best strategies jointly leverage the full range of approaches to data privacy and security in combination.
Related papers
- Privacy Amplification by Missing Data [4.9024539661445825]
We analyze missing data as a privacy amplification mechanism within the framework of differential privacy.<n>We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.
arXiv Detail & Related papers (2026-02-02T10:28:41Z) - How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - SemDP: Semantic-level Differential Privacy Protection for Face Datasets [4.694266441149191]
We propose a semantic-level differential privacy protection scheme that applies to the entire face dataset.<n>We first extract semantic information from the face dataset to build an attribute database, then apply differential perturbations to obscure this attribute data, and finally use an image model to generate a protected face dataset.
arXiv Detail & Related papers (2024-12-20T06:00:59Z) - Activity Recognition on Avatar-Anonymized Datasets with Masked Differential Privacy [64.32494202656801]
Privacy-preserving computer vision is an important emerging problem in machine learning and artificial intelligence.<n>We present anonymization pipeline that replaces sensitive human subjects in video datasets with synthetic avatars within context.<n>We also proposeMaskDP to protect non-anonymized but privacy sensitive background information.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Privacy-Preserving Hierarchical Anonymization Framework over Encrypted Data [0.061446808540639365]
This study proposes a hierarchical k-anonymization framework using homomorphic encryption and secret sharing composed of two types of domains.
The experimental results show that connecting two domains can accelerate the anonymization process, indicating that the proposed secure hierarchical architecture is practical and efficient.
arXiv Detail & Related papers (2023-10-19T01:08:37Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Quantifying identifiability to choose and audit $\epsilon$ in
differentially private deep learning [15.294433619347082]
To use differential privacy in machine learning, data scientists must choose privacy parameters $(epsilon,delta)$.
We transform $(epsilon,delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset.
We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(epsilon,delta)$.
arXiv Detail & Related papers (2021-03-04T09:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.