Certified Data Removal in Sum-Product Networks
- URL: http://arxiv.org/abs/2210.01451v1
- Date: Tue, 4 Oct 2022 08:22:37 GMT
- Title: Certified Data Removal in Sum-Product Networks
- Authors: Alexander Becker and Thomas Liebig
- Abstract summary: Deleting the collected data is often insufficient to guarantee data privacy.
UnlearnSPN is an algorithm that removes the influence of single data points from a trained sum-product network.
- Score: 78.27542864367821
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Data protection regulations like the GDPR or the California Consumer Privacy
Act give users more control over the data that is collected about them.
Deleting the collected data is often insufficient to guarantee data privacy
since it is often used to train machine learning models, which can expose
information about the training data. Thus, a guarantee that a trained model
does not expose information about its training data is additionally needed. In
this paper, we present UnlearnSPN -- an algorithm that removes the influence of
single data points from a trained sum-product network and thereby allows
fulfilling data privacy requirements on demand.
Related papers
- Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - A Privacy-Preserving Outsourced Data Model in Cloud Environment [8.176020822058586]
Data security and privacy problems are among the critical hindrances to using machine learning tools.
A privacy-preserving model is proposed, which protects the privacy of the data without compromising machine learning efficiency.
Fog nodes collect the noise-added data from the data owners, then shift it to the cloud platform for storage, computation, and performing the classification tasks.
arXiv Detail & Related papers (2022-11-24T11:27:30Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - Transferable Unlearnable Examples [63.64357484690254]
Un unlearnable strategies have been introduced to prevent third parties from training on the data without permission.
They add perturbations to the users' data before publishing, which aims to make the models trained on the published dataset invalidated.
We propose a novel unlearnable strategy based on Classwise Separability Discriminant (CSD), which aims to better transfer the unlearnable effects to other training settings and datasets.
arXiv Detail & Related papers (2022-10-18T19:23:52Z) - Machine unlearning via GAN [2.406359246841227]
Machine learning models, especially deep models, may unintentionally remember information about their training data.
We present a GAN-based algorithm to delete data in deep models, which significantly improves deleting speed compared to retraining from scratch.
arXiv Detail & Related papers (2021-11-22T05:28:57Z) - Amnesiac Machine Learning [15.680008735220785]
Recently enacted General Data Protection Regulation affects any data holder that has data on European Union residents.
Models are vulnerable to information leaking attacks such as model inversion attacks.
We present two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations.
arXiv Detail & Related papers (2020-10-21T13:14:17Z) - ML Privacy Meter: Aiding Regulatory Compliance by Quantifying the
Privacy Risks of Machine Learning [10.190911271176201]
Machine learning models pose an additional privacy risk to the data by indirectly revealing about it through the model predictions and parameters.
There is an immediate need for a tool that can quantify the privacy risk to data from models.
We present ML Privacy Meter, a tool that can quantify the privacy risk to data from models through state of the art membership inference attack techniques.
arXiv Detail & Related papers (2020-07-18T06:21:35Z) - Security and Privacy Preserving Deep Learning [2.322461721824713]
Massive data collection required for deep learning presents obvious privacy issues.
Users personal, highly sensitive data such as photos and voice recordings are kept indefinitely by the companies that collect it.
Deep neural networks are susceptible to various inference attacks as they remember information about their training data.
arXiv Detail & Related papers (2020-06-23T01:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.