The Data Minimization Principle in Machine Learning
- URL: http://arxiv.org/abs/2405.19471v1
- Date: Wed, 29 May 2024 19:40:27 GMT
- Title: The Data Minimization Principle in Machine Learning
- Authors: Prakhar Ganesh, Cuong Tran, Reza Shokri, Ferdinando Fioretto,
- Abstract summary: Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
- Score: 61.17813282782266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation. This paper addresses this gap and introduces an optimization framework for data minimization based on its legal definitions. It then adapts several optimization algorithms to perform data minimization and conducts a comprehensive evaluation in terms of their compliance with minimization objectives as well as their impact on user privacy. Our analysis underscores the mismatch between the privacy expectations of data minimization and the actual privacy benefits, emphasizing the need for approaches that account for multiple facets of real-world privacy risks.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - The trade-off between data minimization and fairness in collaborative filtering [1.8936798735951967]
General Data Protection Regulations aim to safeguard individuals' personal information from harm.
While full compliance is mandatory in the EU, it is not in other places.
This paper studies the relationship between principles of data minimization and fairness in recommender systems.
arXiv Detail & Related papers (2024-09-21T02:32:26Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Learning to Limit Data Collection via Scaling Laws: Data Minimization
Compliance in Practice [62.44110411199835]
We build on literature in machine learning law to propose framework for limiting collection based on data interpretation that ties data to system performance.
We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique.
arXiv Detail & Related papers (2021-07-16T19:59:01Z) - Reviving Purpose Limitation and Data Minimisation in Personalisation,
Profiling and Decision-Making Systems [0.0]
This paper determines, through an interdisciplinary law and computer science lens, whether data minimisation and purpose limitation can be meaningfully implemented in data-driven systems.
Our analysis reveals that the two legal principles continue to play an important role in mitigating the risks of personal data processing.
We highlight that even though these principles are important safeguards in the systems under consideration, there are important limits to their practical implementation.
arXiv Detail & Related papers (2021-01-15T16:36:29Z) - Operationalizing the Legal Principle of Data Minimization for
Personalization [64.0027026050706]
We identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization.
We find that the performance decrease incurred by data minimization might not be substantial, but it might disparately impact different users.
arXiv Detail & Related papers (2020-05-28T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.