Beyond Labeling Oracles: What does it mean to steal ML models?
- URL: http://arxiv.org/abs/2310.01959v3
- Date: Thu, 13 Jun 2024 09:22:57 GMT
- Title: Beyond Labeling Oracles: What does it mean to steal ML models?
- Authors: Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot,
- Abstract summary: Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
- Score: 52.63413852460003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. Machine Learning (ML) models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extraction commonly claims or presumes that the attacker is able to save on both data acquisition and labeling costs. We thoroughly evaluate this assumption and find that the attacker often does not. This is because current attacks implicitly rely on the adversary being able to sample from the victim model's data distribution. We thoroughly research factors influencing the success of model extraction. We discover that prior knowledge of the attacker, i.e., access to in-distribution data, dominates other factors like the attack policy the adversary follows to choose which queries to make to the victim model API. Our findings urge the community to redefine the adversarial goals of ME attacks as current evaluation methods misinterpret the ME performance.
Related papers
- Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based
sample selection [10.513955887214497]
In Model Stealing Attacks (MSA), a machine learning model is queried repeatedly to build a labelled dataset.
In this work, we explore the usage of an ensemble of deep learning models as our thief model.
We achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset.
arXiv Detail & Related papers (2023-11-08T10:31:29Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - Dataset Inference for Self-Supervised Models [21.119579812529395]
Self-supervised models are increasingly prevalent in machine learning (ML)
They are vulnerable to model stealing attacks due to the high dimensionality of vector representations they output.
We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing.
arXiv Detail & Related papers (2022-09-16T15:39:06Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - Increasing the Cost of Model Extraction with Calibrated Proof of Work [25.096196576476885]
In model extraction attacks, adversaries can steal a machine learning model exposed via a public API.
We propose requiring users to complete a proof-of-work before they can read the model's predictions.
arXiv Detail & Related papers (2022-01-23T12:21:28Z) - Defending against Model Stealing via Verifying Embedded External
Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures.
We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features.
Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z) - MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks.
A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.
In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.