Related papers: Model for Peanuts: Hijacking ML Models without Training Access is Possible

Model for Peanuts: Hijacking ML Models without Training Access is Possible

URL: http://arxiv.org/abs/2406.01708v1
Date: Mon, 3 Jun 2024 18:04:37 GMT
Title: Model for Peanuts: Hijacking ML Models without Training Access is Possible
Authors: Mahmoud Ghorbel, Halima Bouzidi, Ioan Marius Bilasco, Ihsen Alouani,
Abstract summary: Model hijacking is an attack where an adversary aims to hijack a victim model to execute a different task than its original one. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original dataset.
Score: 5.005171792255858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its original one. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Prior state-of-the-art works consider model hijacking as a training time attack, whereby an adversary requires access to the ML model training to execute their attack. In this paper, we consider a stronger threat model where the attacker has no access to the training phase of the victim model. Our intuition is that ML models, typically over-parameterized, might (unintentionally) learn more than the intended task for they are trained. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples using distance measures in the latent space of the victim model to previously known samples associated with the hijacking task classes. SnatchML empirically shows that benign pre-trained models can execute tasks that are semantically related to the initial task. Surprisingly, this can be true even for hijacking tasks unrelated to the original task. We also explore different methods to mitigate this risk. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original task dataset. We also provide insights on over-parameterization as one possible inherent factor that makes model hijacking easier, and we accordingly propose a compression-based countermeasure against this attack.

Related papers

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning [0.6117371161379209]
We propose a novel privacy attack that infers whether a data sample has been unlearned, following a strict threat model.<n>Our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
arXiv Detail & Related papers (2025-06-11T16:43:36Z)
Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models [48.36985844329255]
Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks.<n>Due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks.<n>We propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs.
arXiv Detail & Related papers (2025-05-29T15:37:23Z)
LoBAM: LoRA-Based Backdoor Attack on Model Merging [27.57659381949931]
Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains. Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources. We propose LoBAM, a method that yields high attack success rate with minimal training resources.
arXiv Detail & Related papers (2024-11-23T20:41:24Z)
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing [21.52641337754884]
A type of adversarial attack can manipulate the behavior of machine learning models through contaminating their training dataset. We introduce our EDT model, an textbfEfficient, textbfData-free, textbfTraining-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models.
arXiv Detail & Related papers (2024-10-23T20:32:14Z)
Model Hijacking Attack in Federated Learning [19.304332176437363]
HijackFL is the first-of-its-kind hijacking attack against the global model in federated learning. It aims to force the global model to perform a different task from its original task without the server or benign client noticing. We conduct extensive experiments on four benchmark datasets and three popular models.
arXiv Detail & Related papers (2024-08-04T20:02:07Z)
Vera Verto: Multimodal Hijacking Attack [22.69532868255637]
A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own hijacking tasks. We transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNISTs.
arXiv Detail & Related papers (2024-07-31T19:37:06Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments. We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z)
MOVE: Effective and Harmless Ownership Verification via Embedded External Features [104.97541464349581]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously. We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. We then train a meta-classifier to determine whether a model is stolen from the victim.
arXiv Detail & Related papers (2022-08-04T02:22:29Z)
Careful What You Wish For: on the Extraction of Adversarially Trained Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats. We propose a framework to assess extraction attacks on adversarially trained models. We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z)
Get a Model! Model Hijacking Attack Against Machine Learning Models [30.346469782056406]
We propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. adversary aims to hijack a target model to execute a different task without the model owner noticing. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.
arXiv Detail & Related papers (2021-11-08T11:30:50Z)
Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.