Data Minimization for GDPR Compliance in Machine Learning Models
- URL: http://arxiv.org/abs/2008.04113v1
- Date: Thu, 6 Aug 2020 08:21:15 GMT
- Title: Data Minimization for GDPR Compliance in Machine Learning Models
- Authors: Abigail Goldsteen, Gilad Ezov, Ron Shmelkin, Micha Moffie, Ariel
Farkash
- Abstract summary: EU General Data Protection Regulation requires only data necessary to fulfill a certain purpose to be collected.
We present a first-of-a-kind method to reduce the amount of personal data needed to perform predictions.
Our method makes use of knowledge encoded within the model to produce a generalization that has little impact on its accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The EU General Data Protection Regulation (GDPR) mandates the principle of
data minimization, which requires that only data necessary to fulfill a certain
purpose be collected. However, it can often be difficult to determine the
minimal amount of data required, especially in complex machine learning models
such as neural networks. We present a first-of-a-kind method to reduce the
amount of personal data needed to perform predictions with a machine learning
model, by removing or generalizing some of the input features. Our method makes
use of the knowledge encoded within the model to produce a generalization that
has little to no impact on its accuracy. This enables the creators and users of
machine learning models to acheive data minimization, in a provable manner.
Related papers
- Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective [4.31734012105466]
Machine Unlearning is the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model.
We propose a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network.
Our novel approach, termed textbfPartially-Blinded Unlearning (PBU), surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness.
arXiv Detail & Related papers (2024-03-24T17:33:22Z) - Certain and Approximately Certain Models for Statistical Learning [4.318959672085627]
We show that it is possible to learn accurate models directly from data with missing values for certain training data and target models.
We build efficient algorithms with theoretical guarantees to check this necessity and return accurate models in cases where imputation is unnecessary.
arXiv Detail & Related papers (2024-02-27T22:49:33Z) - An Information Theoretic Approach to Machine Unlearning [45.600917449314444]
Key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Sample-Efficient Personalization: Modeling User Parameters as Low Rank
Plus Sparse Components [30.32486162748558]
Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems.
We propose a novel meta-learning style approach that models network weights as a sum of low-rank and sparse components.
We show that AMHT-LRS solves the problem efficiently with nearly optimal sample complexity.
arXiv Detail & Related papers (2022-10-07T12:50:34Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Zero-shot meta-learning for small-scale data from human subjects [10.320654885121346]
We develop a framework to rapidly adapt to a new prediction task with limited training data for out-of-sample test data.
Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions.
Our model has implications for improved generalization of small-size human studies to the wider population.
arXiv Detail & Related papers (2022-03-29T17:42:04Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.