I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks
- URL: http://arxiv.org/abs/2508.21654v1
- Date: Fri, 29 Aug 2025 14:16:19 GMT
- Title: I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks
- Authors: Daryna Oliynyk, Rudolf Mayer, Kathrin Grosse, Andreas Rauber,
- Abstract summary: Model stealing attacks endanger the confidentiality of machine learning models offered as a service.<n>Malicious parties can query a model to label data samples and train their own substitute model, violating intellectual property.<n>This paper is the first to address this gap by providing recommendations for designing and evaluating model stealing attacks.
- Score: 6.201492569689928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model stealing attacks endanger the confidentiality of machine learning models offered as a service. Although these models are kept secret, a malicious party can query a model to label data samples and train their own substitute model, violating intellectual property. While novel attacks in the field are continually being published, their design and evaluations are not standardised, making it challenging to compare prior works and assess progress in the field. This paper is the first to address this gap by providing recommendations for designing and evaluating model stealing attacks. To this end, we study the largest group of attacks that rely on training a substitute model -- those attacking image classification models. We propose the first comprehensive threat model and develop a framework for attack comparison. Further, we analyse attack setups from related works to understand which tasks and models have been studied the most. Based on our findings, we present best practices for attack development before, during, and beyond experiments and derive an extensive list of open research questions regarding the evaluation of model stealing attacks. Our findings and recommendations also transfer to other problem domains, hence establishing the first generic evaluation methodology for model stealing attacks.
Related papers
- Model Stealing Attack against Recommender System [85.1927483219819]
Some adversarial attacks have achieved model stealing attacks against recommender systems.
In this paper, we constrain the volume of available target data and queries and utilize auxiliary data, which shares the item set with the target data, to promote model stealing attacks.
arXiv Detail & Related papers (2023-12-18T05:28:02Z) - SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models.
Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Get a Model! Model Hijacking Attack Against Machine Learning Models [30.346469782056406]
We propose a new training time attack against computer vision based machine learning models, namely model hijacking attack.
adversary aims to hijack a target model to execute a different task without the model owner noticing.
Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.
arXiv Detail & Related papers (2021-11-08T11:30:50Z) - Teacher Model Fingerprinting Attacks Against Transfer Learning [23.224444604615123]
We present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context.
We propose a teacher model fingerprinting attack to infer the origin of a student model it transfers from.
We show that our attack can accurately identify the model origin with few probing queries.
arXiv Detail & Related papers (2021-06-23T15:52:35Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.