HoneypotNet: Backdoor Attacks Against Model Extraction
- URL: http://arxiv.org/abs/2501.01090v1
- Date: Thu, 02 Jan 2025 06:23:51 GMT
- Title: HoneypotNet: Backdoor Attacks Against Model Extraction
- Authors: Yixu Wang, Tianle Gu, Yan Teng, Yingchun Wang, Xingjun Ma,
- Abstract summary: Model extraction attacks pose severe security threats to production models and ML platforms.
We introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous.
HoneypotNet can inject backdoors into substitute models with a high success rate.
- Score: 24.603590328055027
- License:
- Abstract: Model extraction attacks are one type of inference-time attacks that approximate the functionality and performance of a black-box victim model by launching a certain number of queries to the model and then leveraging the model's predictions to train a substitute model. These attacks pose severe security threats to production models and MLaaS platforms and could cause significant monetary losses to the model owners. A body of work has proposed to defend machine learning models against model extraction attacks, including both active defense methods that modify the model's outputs or increase the query overhead to avoid extraction and passive defense methods that detect malicious queries or leverage watermarks to perform post-verification. In this work, we introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous such that any malicious users that attempt to use the output to train a substitute model will be poisoned. To this end, we propose a novel lightweight backdoor attack method dubbed HoneypotNet that replaces the classification layer of the victim model with a honeypot layer and then fine-tunes the honeypot layer with a shadow model (to simulate model extraction) via bi-level optimization to modify its output to be poisonous while remaining the original performance. We empirically demonstrate on four commonly used benchmark datasets that HoneypotNet can inject backdoors into substitute models with a high success rate. The injected backdoor not only facilitates ownership verification but also disrupts the functionality of substitute models, serving as a significant deterrent to model extraction attacks.
Related papers
- Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors [1.1510009152620668]
This work focuses on object detection (OD) models.
Existing backdoor attacks on OD models are not applicable for model watermarking as the defense against MEAs on a realistic threat model.
Our proposed approach involves inserting a backdoor into extracted models via APIs by stealthily modifying the bounding-boxes (BBs) of objects detected in queries while keeping the OD capability.
arXiv Detail & Related papers (2024-11-20T05:40:20Z) - Towards Scalable and Robust Model Versioning [30.249607205048125]
Malicious incursions aimed at gaining access to deep learning models are on the rise.
We show how to generate multiple versions of a model that possess different attack properties.
We show theoretically that this can be accomplished by incorporating parameterized hidden distributions into the model training data.
arXiv Detail & Related papers (2024-01-17T19:55:49Z) - MEAOD: Model Extraction Attack against Object Detectors [45.817537875368956]
Model extraction attacks allow attackers to replicate a substitute model with comparable functionality to the victim model.
We propose an effective attack method called MEAOD for object detection models.
We achieve an extraction performance of over 70% under the given condition of a 10k query budget.
arXiv Detail & Related papers (2023-12-22T13:28:50Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models.
Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - DeepSight: Mitigating Backdoor Attacks in Federated Learning Through
Deep Model Inspection [26.593268413299228]
Federated Learning (FL) allows multiple clients to collaboratively train a Neural Network (NN) model on their private data without revealing the data.
DeepSight is a novel model filtering approach for mitigating backdoor attacks.
We show that it can mitigate state-of-the-art backdoor attacks with a negligible impact on the model's performance on benign data.
arXiv Detail & Related papers (2022-01-03T17:10:07Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.