Related papers: Get a Model! Model Hijacking Attack Against Machine Learning Models

Get a Model! Model Hijacking Attack Against Machine Learning Models

URL: http://arxiv.org/abs/2111.04394v1
Date: Mon, 8 Nov 2021 11:30:50 GMT
Title: Get a Model! Model Hijacking Attack Against Machine Learning Models
Authors: Ahmed Salem, Michael Backes, Yang Zhang
Abstract summary: We propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. adversary aims to hijack a target model to execute a different task without the model owner noticing. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.
Score: 30.346469782056406
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) has established itself as a cornerstone for various critical applications ranging from autonomous driving to authentication systems. However, with this increasing adoption rate of machine learning models, multiple attacks have emerged. One class of such attacks is training time attack, whereby an adversary executes their attack before or during the machine learning model training. In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. The adversary aims to hijack a target model to execute a different task than its original one without the model owner noticing. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Model hijacking attacks are launched in the same way as existing data poisoning attacks. However, one requirement of the model hijacking attack is to be stealthy, i.e., the data samples used to hijack the target model should look similar to the model's original training dataset. To this end, we propose two different model hijacking attacks, namely Chameleon and Adverse Chameleon, based on a novel encoder-decoder style ML model, namely the Camouflager. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.

Related papers

Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models [48.36985844329255]
Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks.<n>Due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks.<n>We propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs.
arXiv Detail & Related papers (2025-05-29T15:37:23Z)
Vera Verto: Multimodal Hijacking Attack [22.69532868255637]
A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own hijacking tasks. We transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNISTs.
arXiv Detail & Related papers (2024-07-31T19:37:06Z)
Model for Peanuts: Hijacking ML Models without Training Access is Possible [5.005171792255858]
Model hijacking is an attack where an adversary aims to hijack a victim model to execute a different task than its original one. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original dataset.
arXiv Detail & Related papers (2024-06-03T18:04:37Z)
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint. We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC) SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z)
Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It [41.39995986856193]
Transfer learning is an important approach that produces pre-trained teacher models. Recent research on transfer learning has found that it is vulnerable to various attacks. It is still not clear whether transfer learning is vulnerable to model inversion attacks.
arXiv Detail & Related papers (2022-03-13T05:07:02Z)
Manipulating SGD with Data Ordering Attacks [23.639512087220137]
We present a class of training-time attacks that require no changes to the underlying model dataset or architecture. In particular, an attacker can disrupt the integrity and availability of a model by simply reordering training batches. Attacks have a long-term impact in that they decrease model performance hundreds of epochs after the attack took place.
arXiv Detail & Related papers (2021-04-19T22:17:27Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models. Current substitute attacks need pre-trained models to generate adversarial examples. In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
DaST: Data-free Substitute Training for Adversarial Attacks [55.76371274622313]
We propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks. To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models. Experiments demonstrate the substitute models can achieve competitive performance compared with the baseline models.
arXiv Detail & Related papers (2020-03-28T04:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.