Dataset Inference: Ownership Resolution in Machine Learning
- URL: http://arxiv.org/abs/2104.10706v1
- Date: Wed, 21 Apr 2021 18:12:18 GMT
- Title: Dataset Inference: Ownership Resolution in Machine Learning
- Authors: Pratyush Maini and Mohammad Yaghini and Nicolas Papernot
- Abstract summary: knowledge contained in stolen model's training set is what is common to all stolen copies.
We introduce $dataset$ $inference, the process of identifying whether a suspected model copy has private knowledge from the original model's dataset.
Experiments on CIFAR10, SVHN, CIFAR100 and ImageNet show that model owners can claim with confidence greater than 99% that their model (or dataset as a matter of fact) was stolen.
- Score: 18.248121977353506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With increasingly more data and computation involved in their training,
machine learning models constitute valuable intellectual property. This has
spurred interest in model stealing, which is made more practical by advances in
learning with partial, little, or no supervision. Existing defenses focus on
inserting unique watermarks in a model's decision surface, but this is
insufficient: the watermarks are not sampled from the training distribution and
thus are not always preserved during model stealing. In this paper, we make the
key observation that knowledge contained in the stolen model's training set is
what is common to all stolen copies. The adversary's goal, irrespective of the
attack employed, is always to extract this knowledge or its by-products. This
gives the original model's owner a strong advantage over the adversary: model
owners have access to the original training data. We thus introduce $dataset$
$inference$, the process of identifying whether a suspected model copy has
private knowledge from the original model's dataset, as a defense against model
stealing. We develop an approach for dataset inference that combines
statistical testing with the ability to estimate the distance of multiple data
points to the decision boundary. Our experiments on CIFAR10, SVHN, CIFAR100 and
ImageNet show that model owners can claim with confidence greater than 99% that
their model (or dataset as a matter of fact) was stolen, despite only exposing
50 of the stolen model's training points. Dataset inference defends against
state-of-the-art attacks even when the adversary is adaptive. Unlike prior
work, it does not require retraining or overfitting the defended model.
Related papers
- Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Dataset Inference for Self-Supervised Models [21.119579812529395]
Self-supervised models are increasingly prevalent in machine learning (ML)
They are vulnerable to model stealing attacks due to the high dimensionality of vector representations they output.
We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing.
arXiv Detail & Related papers (2022-09-16T15:39:06Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Defending against Model Stealing via Verifying Embedded External
Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures.
We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features.
Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z) - Entangled Watermarks as a Defense against Model Extraction [42.74645868767025]
Entangled Watermarking Embeddings (EWE) are used to protect machine learning models fromExtraction attacks.
EWE learns features for classifying data that is sampled from the task distribution and data that encodes watermarks.
Experiments on MNIST, Fashion-MNIST, CIFAR-10, and Speech Commands validate that the defender can claim model ownership with 95% confidence with less than 100 queries to the stolen copy.
arXiv Detail & Related papers (2020-02-27T15:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.