Related papers: SnatchML: Hijacking ML models without Training Access

SnatchML: Hijacking ML models without Training Access

URL: http://arxiv.org/abs/2406.01708v2
Date: Mon, 14 Apr 2025 09:56:26 GMT
Title: SnatchML: Hijacking ML models without Training Access
Authors: Mahmoud Ghorbel, Halima Bouzidi, Ioan Marius Bilasco, Ihsen Alouani,
Abstract summary: We consider a stronger threat model for an inference hijacking-time attack, where the adversary has no access to the training phase of the victim model.<n>We propose SnatchML, a new training-free model hijacking attack.<n>Our results on models deployed on AWS Sagemaker showed that SnatchML can deliver high accuracy on hijacking tasks.
Score: 5.005171792255858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model hijacking can cause significant accountability and security risks since the owner of a hijacked model can be framed for having their model offer illegal or unethical services. Prior works consider model hijacking as a training time attack, whereby an adversary requires full access to the ML model training. In this paper, we consider a stronger threat model for an inference-time hijacking attack, where the adversary has no access to the training phase of the victim model. Our intuition is that ML models, which are typically over-parameterized, might have the capacity to (unintentionally) learn more than the intended task they are trained for. We propose SnatchML, a new training-free model hijacking attack, that leverages the extra capacity learnt by the victim model to infer different tasks that can be semantically related or unrelated to the original one. Our results on models deployed on AWS Sagemaker showed that SnatchML can deliver high accuracy on hijacking tasks. Interestingly, while all previous approaches are limited by the number of classes in the benign task, SnatchML can hijack models for tasks that contain more classes than the original. We explore different methods to mitigate this risk; We propose meta-unlearning, which is designed to help the model unlearn a potentially malicious task while training for the original task. We also provide insights on over-parametrization as a possible inherent factor that facilitates model hijacking, and accordingly, we propose a compression-based countermeasure to counteract this attack. We believe this work offers a previously overlooked perspective on model hijacking attacks, presenting a stronger threat model and higher applicability in real-world contexts.

Related papers

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning [0.6117371161379209]
We propose a novel privacy attack that infers whether a data sample has been unlearned, following a strict threat model.<n>Our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
arXiv Detail & Related papers (2025-06-11T16:43:36Z)
Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models [48.36985844329255]
Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks.<n>Due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks.<n>We propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs.
arXiv Detail & Related papers (2025-05-29T15:37:23Z)
LoBAM: LoRA-Based Backdoor Attack on Model Merging [27.57659381949931]
Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains. Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources. We propose LoBAM, a method that yields high attack success rate with minimal training resources.
arXiv Detail & Related papers (2024-11-23T20:41:24Z)
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing [21.52641337754884]
A type of adversarial attack can manipulate the behavior of machine learning models through contaminating their training dataset. We introduce our EDT model, an textbfEfficient, textbfData-free, textbfTraining-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models.
arXiv Detail & Related papers (2024-10-23T20:32:14Z)
Model Hijacking Attack in Federated Learning [19.304332176437363]
HijackFL is the first-of-its-kind hijacking attack against the global model in federated learning. It aims to force the global model to perform a different task from its original task without the server or benign client noticing. We conduct extensive experiments on four benchmark datasets and three popular models.
arXiv Detail & Related papers (2024-08-04T20:02:07Z)
Vera Verto: Multimodal Hijacking Attack [22.69532868255637]
A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own hijacking tasks. We transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNISTs.
arXiv Detail & Related papers (2024-07-31T19:37:06Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments. We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z)
MOVE: Effective and Harmless Ownership Verification via Embedded External Features [104.97541464349581]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously. We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. We then train a meta-classifier to determine whether a model is stolen from the victim.
arXiv Detail & Related papers (2022-08-04T02:22:29Z)
Careful What You Wish For: on the Extraction of Adversarially Trained Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats. We propose a framework to assess extraction attacks on adversarially trained models. We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z)
Get a Model! Model Hijacking Attack Against Machine Learning Models [30.346469782056406]
We propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. adversary aims to hijack a target model to execute a different task without the model owner noticing. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.
arXiv Detail & Related papers (2021-11-08T11:30:50Z)
Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.