Related papers: MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

URL: http://arxiv.org/abs/2403.18580v1
Date: Wed, 27 Mar 2024 13:59:21 GMT
Title: MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction
Authors: Mahendra Gurve, Sankar Behera, Satyadev Ahlawat, Yamuna Prasad,
Abstract summary: "MisGUIDE" is a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries.
Score: 0.8437187555622164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model's functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset. This paper proposes "MisGUIDE", a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.

Related papers

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning [4.275908952997288]
Federated Learning (FL) is a collaborative learning framework designed to protect client data.<n>Despite FL's privacy-preserving goals, its distributed nature makes it particularly susceptible to model extraction attacks.<n>This paper examines the vulnerability of FL-based victim models to two types of model extraction attacks.
arXiv Detail & Related papers (2025-05-25T22:40:10Z)
Exploring Query Efficient Data Generation towards Data-free Model Stealing in Hard Label Setting [38.755154033324374]
Data-free model stealing involves replicating the functionality of a target model into a substitute model without accessing the target model's structure, parameters, or training data. This paper presents a new data-free model stealing approach called Query Efficient Data Generation (textbfQEDG) We introduce two distinct loss functions to ensure the generation of sufficient samples that closely and uniformly align with the target model's decision boundary.
arXiv Detail & Related papers (2024-12-18T03:03:15Z)
Towards Trustworthy Web Attack Detection: An Uncertainty-Aware Ensemble Deep Kernel Learning Model [4.791983040541727]
We propose an Uncertainty-aware Ensemble Deep Kernel Learning (UEDKL) model to detect web attacks. The proposed UEDKL utilizes a deep kernel learning model to distinguish normal HTTP requests from different types of web attacks. Experiments on BDCI and SRBH datasets demonstrated that the proposed UEDKL framework yields significant improvement in both web attack detection performance and uncertainty estimation quality.
arXiv Detail & Related papers (2024-10-10T08:47:55Z)
MEAOD: Model Extraction Attack against Object Detectors [45.817537875368956]
Model extraction attacks allow attackers to replicate a substitute model with comparable functionality to the victim model. We propose an effective attack method called MEAOD for object detection models. We achieve an extraction performance of over 70% under the given condition of a 10k query budget.
arXiv Detail & Related papers (2023-12-22T13:28:50Z)
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack. We exploit text similarity and the model's resistance to document modifications as potential MI signals. We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z)
IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection [3.3772986620114387]
We present an approach for modelling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance. First, we present an improved rolling window approach for feature extraction, and introduce a multi-step feature selection process that reduces overfitting. Second, we build and test models using isolated train and test datasets, thereby avoiding common data leaks. Third, we rigorously evaluate our methodology using a diverse portfolio of machine learning models, evaluation metrics and datasets.
arXiv Detail & Related papers (2023-10-17T21:46:43Z)
Data-Free Model Extraction Attacks in the Context of Object Detection [0.6719751155411076]
A significant number of machine learning models are vulnerable to model extraction attacks. We propose an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. We find that the proposed model extraction method achieves significant results by using reasonable queries.
arXiv Detail & Related papers (2023-08-09T06:23:54Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models [1.8752655643513647]
XAI tools can increase the vulnerability of model extraction attacks, which is a concern when model owners prefer black-box access. We propose a novel retraining (learning) based model extraction attack framework against interpretable models under black-box settings. We show that AUTOLYCUS is highly effective, requiring significantly fewer queries compared to state-of-the-art attacks.
arXiv Detail & Related papers (2023-02-04T13:23:39Z)
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.