Learning to Limit Data Collection via Scaling Laws: Data Minimization
Compliance in Practice
- URL: http://arxiv.org/abs/2107.08096v1
- Date: Fri, 16 Jul 2021 19:59:01 GMT
- Title: Learning to Limit Data Collection via Scaling Laws: Data Minimization
Compliance in Practice
- Authors: Divya Shanmugam, Samira Shabanian, Fernando Diaz, Mich\`ele Finck,
Asia Biega
- Abstract summary: We build on literature in machine learning law to propose framework for limiting collection based on data interpretation that ties data to system performance.
We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique.
- Score: 62.44110411199835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data minimization is a legal obligation defined in the European Union's
General Data Protection Regulation (GDPR) as the responsibility to process an
adequate, relevant, and limited amount of personal data in relation to a
processing purpose. However, unlike fairness or transparency, the principle has
not seen wide adoption for machine learning systems due to a lack of
computational interpretation. In this paper, we build on literature in machine
learning and law to propose the first learning framework for limiting data
collection based on an interpretation that ties the data collection purpose to
system performance. We formalize a data minimization criterion based on
performance curve derivatives and provide an effective and interpretable
piecewise power law technique that models distinct stages of an algorithm's
performance throughout data collection. Results from our empirical
investigation offer deeper insights into the relevant considerations when
designing a data minimization framework, including the choice of feature
acquisition algorithm, initialization conditions, as well as impacts on
individuals that hint at tensions between data minimization and fairness.
Related papers
- The trade-off between data minimization and fairness in collaborative filtering [1.8936798735951967]
General Data Protection Regulations aim to safeguard individuals' personal information from harm.
While full compliance is mandatory in the EU, it is not in other places.
This paper studies the relationship between principles of data minimization and fairness in recommender systems.
arXiv Detail & Related papers (2024-09-21T02:32:26Z) - The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z) - Reviving Purpose Limitation and Data Minimisation in Personalisation,
Profiling and Decision-Making Systems [0.0]
This paper determines, through an interdisciplinary law and computer science lens, whether data minimisation and purpose limitation can be meaningfully implemented in data-driven systems.
Our analysis reveals that the two legal principles continue to play an important role in mitigating the risks of personal data processing.
We highlight that even though these principles are important safeguards in the systems under consideration, there are important limits to their practical implementation.
arXiv Detail & Related papers (2021-01-15T16:36:29Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Operationalizing the Legal Principle of Data Minimization for
Personalization [64.0027026050706]
We identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization.
We find that the performance decrease incurred by data minimization might not be substantial, but it might disparately impact different users.
arXiv Detail & Related papers (2020-05-28T00:43:06Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.