Vera Verto: Multimodal Hijacking Attack
- URL: http://arxiv.org/abs/2408.00129v1
- Date: Wed, 31 Jul 2024 19:37:06 GMT
- Title: Vera Verto: Multimodal Hijacking Attack
- Authors: Minxing Zhang, Ahmed Salem, Michael Backes, Yang Zhang,
- Abstract summary: A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own hijacking tasks.
We transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities.
Our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNISTs.
- Score: 22.69532868255637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing cost of training machine learning (ML) models has led to the inclusion of new parties to the training pipeline, such as users who contribute training data and companies that provide computing resources. This involvement of such new parties in the ML training process has introduced new attack surfaces for an adversary to exploit. A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own -- possibly malicious -- hijacking tasks. However, the scope of the model hijacking attack is so far limited to the homogeneous-modality tasks. In this paper, we transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Specifically, we focus on the setting where an adversary implements a natural language processing (NLP) hijacking task into an image classification model. To mount the attack, we propose a novel encoder-decoder based framework, namely the Blender, which relies on advanced image and language models. Experimental results show that our modal hijacking attack achieves strong performances in different settings. For instance, our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNIST classifiers.
Related papers
- Model Hijacking Attack in Federated Learning [19.304332176437363]
HijackFL is the first-of-its-kind hijacking attack against the global model in federated learning.
It aims to force the global model to perform a different task from its original task without the server or benign client noticing.
We conduct extensive experiments on four benchmark datasets and three popular models.
arXiv Detail & Related papers (2024-08-04T20:02:07Z) - Model for Peanuts: Hijacking ML Models without Training Access is Possible [5.005171792255858]
Model hijacking is an attack where an adversary aims to hijack a victim model to execute a different task than its original one.
We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples.
We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original dataset.
arXiv Detail & Related papers (2024-06-03T18:04:37Z) - One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training [54.622474306336635]
A new weight modification attack called bit flip attack (BFA) was proposed, which exploits memory fault inject techniques.
We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release.
arXiv Detail & Related papers (2023-08-12T09:34:43Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - Get a Model! Model Hijacking Attack Against Machine Learning Models [30.346469782056406]
We propose a new training time attack against computer vision based machine learning models, namely model hijacking attack.
adversary aims to hijack a target model to execute a different task without the model owner noticing.
Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.
arXiv Detail & Related papers (2021-11-08T11:30:50Z) - Adversarial Transfer Attacks With Unknown Data and Class Overlap [19.901933940805684]
Current transfer attack research has an unrealistic advantage for the attacker.
We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings.
This threat model is relevant to applications in medicine, malware, and others.
arXiv Detail & Related papers (2021-09-23T03:41:34Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z) - DaST: Data-free Substitute Training for Adversarial Attacks [55.76371274622313]
We propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks.
To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models.
Experiments demonstrate the substitute models can achieve competitive performance compared with the baseline models.
arXiv Detail & Related papers (2020-03-28T04:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.