Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
Dataset Release
- URL: http://arxiv.org/abs/2103.02683v2
- Date: Fri, 5 Mar 2021 04:55:01 GMT
- Title: Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
Dataset Release
- Authors: Liam Fowl, Ping-yeh Chiang, Micah Goldblum, Jonas Geiping, Arpit
Bansal, Wojtek Czaja, Tom Goldstein
- Abstract summary: We develop a data poisoning method by which publicly released data can be minimally modified to prevent others from train-ing models on it.
We demonstrate the success of our approach onImageNet classification and on facial recognition.
- Score: 52.504589728136615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large organizations such as social media companies continually release data,
for example user images. At the same time, these organizations leverage their
massive corpora of released data to train proprietary models that give them an
edge over their competitors. These two behaviors can be in conflict as an
organization wants to prevent competitors from using their own data to
replicate the performance of their proprietary models. We solve this problem by
developing a data poisoning method by which publicly released data can be
minimally modified to prevent others from train-ing models on it. Moreover, our
method can be used in an online fashion so that companies can protect their
data in real time as they release it.We demonstrate the success of our approach
onImageNet classification and on facial recognition.
Related papers
- Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks.
Several popular community platforms now offer convenient distribution of a large variety of pre-trained models.
We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z) - No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning [18.1129191782913]
Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection.
Traditional federated learning is vulnerable to poisoning attacks, which can not only decrease the model performance, but also implant malicious backdoors.
In this paper, we aim to build a privacy-preserving and Byzantine-robust federated learning scheme to provide an environment with no vandalism (NoV) against attacks from malicious participants.
arXiv Detail & Related papers (2024-06-03T07:59:10Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning.
By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z) - The Devil's Advocate: Shattering the Illusion of Unexploitable Data
using Diffusion Models [14.018862290487617]
We show that a carefully designed denoising process can counteract the data-protecting perturbations.
Our approach, called AVATAR, delivers state-of-the-art performance against a suite of recent availability attacks.
arXiv Detail & Related papers (2023-03-15T10:20:49Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Amnesiac Machine Learning [15.680008735220785]
Recently enacted General Data Protection Regulation affects any data holder that has data on European Union residents.
Models are vulnerable to information leaking attacks such as model inversion attacks.
We present two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations.
arXiv Detail & Related papers (2020-10-21T13:14:17Z) - Anonymizing Machine Learning Models [0.0]
Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation.
We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model.
We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
arXiv Detail & Related papers (2020-07-26T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.