Knowledge Distillation-Based Model Extraction Attack using Private Counterfactual Explanations
- URL: http://arxiv.org/abs/2404.03348v1
- Date: Thu, 4 Apr 2024 10:28:55 GMT
- Title: Knowledge Distillation-Based Model Extraction Attack using Private Counterfactual Explanations
- Authors: Fatima Ezzeddine, Omran Ayoub, Silvia Giordano,
- Abstract summary: We focus on investigating how model explanations, particularly Generative networks (GANs)-based counterfactual explanations (CFs) can be exploited for performing model extraction attacks (MEA)
We propose a novel MEA methodology based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs.
Our findings reveal that the inclusion of privacy layer impacts the performance of the explainer, the quality of CFs, and results in a reduction in the MEA performance.
- Score: 1.6576983459630268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been a notable increase in the deployment of machine learning (ML) models as services (MLaaS) across diverse production software applications. In parallel, explainable AI (XAI) continues to evolve, addressing the necessity for transparency and trustworthiness in ML models. XAI techniques aim to enhance the transparency of ML models by providing insights, in terms of the model's explanations, into their decision-making process. Simultaneously, some MLaaS platforms now offer explanations alongside the ML prediction outputs. This setup has elevated concerns regarding vulnerabilities in MLaaS, particularly in relation to privacy leakage attacks such as model extraction attacks (MEA). This is due to the fact that explanations can unveil insights about the inner workings of the model which could be exploited by malicious users. In this work, we focus on investigating how model explanations, particularly Generative adversarial networks (GANs)-based counterfactual explanations (CFs), can be exploited for performing MEA within the MLaaS platform. We also delve into assessing the effectiveness of incorporating differential privacy (DP) as a mitigation strategy. To this end, we first propose a novel MEA methodology based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs. Then, we advise an approach for training CF generators incorporating DP to generate private CFs. We conduct thorough experimental evaluations on real-world datasets and demonstrate that our proposed KD-based MEA can yield a high-fidelity substitute model with reduced queries with respect to baseline approaches. Furthermore, our findings reveal that the inclusion of a privacy layer impacts the performance of the explainer, the quality of CFs, and results in a reduction in the MEA performance.
Related papers
- Revisiting Catastrophic Forgetting in Large Language Model Tuning [79.70722658190097]
Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data.
This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of large language models.
Experiments on three widely-used fine-tuning datasets, spanning different model scales, demonstrate the effectiveness of our method in alleviating CF.
arXiv Detail & Related papers (2024-06-07T11:09:13Z) - MIA-BAD: An Approach for Enhancing Membership Inference Attack and its
Mitigation with Federated Learning [6.510488168434277]
The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model.
We propose an enhanced Membership Inference Attack with the Batch-wise generated Attack dataset (MIA-BAD)
We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches.
arXiv Detail & Related papers (2023-11-28T06:51:26Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - DualCF: Efficient Model Extraction Attack from Counterfactual
Explanations [57.46134660974256]
Cloud service providers have launched Machine-Learning-as-a-Service platforms to allow users to access large-scale cloudbased models via APIs.
Such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks.
We propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model.
arXiv Detail & Related papers (2022-05-13T08:24:43Z) - ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of
Arbitrary Predictive Models [6.939617874336667]
We introduce a model-agnostic algorithm to generate optimal counterfactual explanations.
Our method is easily applied to any black-box model, as this resembles the environment that the DRL agent interacts with.
In addition, we develop an algorithm to extract explainable decision rules from the DRL agent's policy, so as to make the process of generating CFs itself transparent.
arXiv Detail & Related papers (2021-10-22T17:08:49Z) - EG-Booster: Explanation-Guided Booster of ML Evasion Attacks [3.822543555265593]
We present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting.
EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature.
Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations.
arXiv Detail & Related papers (2021-08-31T15:36:16Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Data and Model Dependencies of Membership Inference Attack [13.951470844348899]
We provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA.
Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use.
We propose using those data and model properties as regularizers to protect ML models against MIA.
arXiv Detail & Related papers (2020-02-17T09:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.