Related papers: Certified Data Removal in Sum-Product Networks

Certified Data Removal in Sum-Product Networks

URL: http://arxiv.org/abs/2210.01451v1
Date: Tue, 4 Oct 2022 08:22:37 GMT
Title: Certified Data Removal in Sum-Product Networks
Authors: Alexander Becker and Thomas Liebig
Abstract summary: Deleting the collected data is often insufficient to guarantee data privacy. UnlearnSPN is an algorithm that removes the influence of single data points from a trained sum-product network.
Score: 78.27542864367821
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Data protection regulations like the GDPR or the California Consumer Privacy Act give users more control over the data that is collected about them. Deleting the collected data is often insufficient to guarantee data privacy since it is often used to train machine learning models, which can expose information about the training data. Thus, a guarantee that a trained model does not expose information about its training data is additionally needed. In this paper, we present UnlearnSPN -- an algorithm that removes the influence of single data points from a trained sum-product network and thereby allows fulfilling data privacy requirements on demand.

Related papers

Machine Unlearning of Traffic State Estimation and Prediction [4.442043151145212]
This study introduces a novel learning paradigm for TSEP-Machine Unlearning TSEP.<n>It enables a trained TSEP model to selectively forget privacy-sensitive, poisoned, or outdated data.<n>By empowering models to "unlearn," we aim to enhance the trustworthiness and reliability of data-driven traffic TSEP.
arXiv Detail & Related papers (2025-07-23T23:23:18Z)
Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z)
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora. For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination. We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z)
Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data. Such mandates require effort both in terms of data as well as model retraining. continuous removal of data and model retraining steps do not scale. We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z)
A Privacy-Preserving Outsourced Data Model in Cloud Environment [8.176020822058586]
Data security and privacy problems are among the critical hindrances to using machine learning tools. A privacy-preserving model is proposed, which protects the privacy of the data without compromising machine learning efficiency. Fog nodes collect the noise-added data from the data owners, then shift it to the cloud platform for storage, computation, and performing the classification tasks.
arXiv Detail & Related papers (2022-11-24T11:27:30Z)
Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z)
Transferable Unlearnable Examples [63.64357484690254]
Un unlearnable strategies have been introduced to prevent third parties from training on the data without permission. They add perturbations to the users' data before publishing, which aims to make the models trained on the published dataset invalidated. We propose a novel unlearnable strategy based on Classwise Separability Discriminant (CSD), which aims to better transfer the unlearnable effects to other training settings and datasets.
arXiv Detail & Related papers (2022-10-18T19:23:52Z)
Machine unlearning via GAN [2.406359246841227]
Machine learning models, especially deep models, may unintentionally remember information about their training data. We present a GAN-based algorithm to delete data in deep models, which significantly improves deleting speed compared to retraining from scratch.
arXiv Detail & Related papers (2021-11-22T05:28:57Z)
Amnesiac Machine Learning [15.680008735220785]
Recently enacted General Data Protection Regulation affects any data holder that has data on European Union residents. Models are vulnerable to information leaking attacks such as model inversion attacks. We present two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations.
arXiv Detail & Related papers (2020-10-21T13:14:17Z)
ML Privacy Meter: Aiding Regulatory Compliance by Quantifying the Privacy Risks of Machine Learning [10.190911271176201]
Machine learning models pose an additional privacy risk to the data by indirectly revealing about it through the model predictions and parameters. There is an immediate need for a tool that can quantify the privacy risk to data from models. We present ML Privacy Meter, a tool that can quantify the privacy risk to data from models through state of the art membership inference attack techniques.
arXiv Detail & Related papers (2020-07-18T06:21:35Z)
Security and Privacy Preserving Deep Learning [2.322461721824713]
Massive data collection required for deep learning presents obvious privacy issues. Users personal, highly sensitive data such as photos and voice recordings are kept indefinitely by the companies that collect it. Deep neural networks are susceptible to various inference attacks as they remember information about their training data.
arXiv Detail & Related papers (2020-06-23T01:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.