Related papers: XRand: Differentially Private Defense against Explanation-Guided Attacks

XRand: Differentially Private Defense against Explanation-Guided Attacks

URL: http://arxiv.org/abs/2212.04454v2
Date: Sat, 10 Dec 2022 05:38:36 GMT
Title: XRand: Differentially Private Defense against Explanation-Guided Attacks
Authors: Truc Nguyen, Phung Lai, NhatHai Phan, My T. Thai
Abstract summary: We introduce a new concept of achieving local differential privacy (LDP) in the explanations. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.
Score: 19.682368614810756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.

Related papers

When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations [58.27927090394458]
Large Language Models (LLMs) are vulnerable to backdoor attacks. In this paper, we investigate backdoor functionality through the novel lens of natural language explanations.
arXiv Detail & Related papers (2024-11-19T18:11:36Z)
Privacy Implications of Explainable AI in Data-Driven Systems [0.0]
Machine learning (ML) models suffer from a lack of interpretability. The absence of transparency, often referred to as the black box nature of ML models, undermines trust. XAI techniques address this challenge by providing frameworks and methods to explain the internal decision-making processes.
arXiv Detail & Related papers (2024-06-22T08:51:58Z)
Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors [2.1165011830664673]
Blinding attacks can drastically alter a machine learning algorithm's prediction and explanation. We leverage statistical analysis to highlight the changes in CNN weights within a CNN following blinding attacks. We introduce a method specifically designed to limit the effectiveness of such attacks during the evaluation phase.
arXiv Detail & Related papers (2024-03-25T09:36:10Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
The Dark Side of AutoML: Towards Architectural Backdoor Search [49.16544351888333]
EVAS is a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. EVAS features high evasiveness, transferability, and robustness, thereby expanding the adversary's design spectrum. This work raises concerns about the current practice of NAS and points to potential directions to develop effective countermeasures.
arXiv Detail & Related papers (2022-10-21T18:13:23Z)
The privacy issue of counterfactual explanations: explanation linkage attacks [0.0]
We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.
arXiv Detail & Related papers (2022-10-21T15:44:19Z)
Differentially Private Counterfactuals via Functional Mechanism [47.606474009932825]
We propose a novel framework to generate differentially private counterfactual (DPC) without touching the deployed model or explanation set. In particular, we train an autoencoder with the functional mechanism to construct noisy class prototypes, and then derive the DPC from the latent prototypes.
arXiv Detail & Related papers (2022-08-04T20:31:22Z)
A Framework for Understanding Model Extraction Attack and Defense [48.421636548746704]
We study tradeoffs between model utility from a benign user's view and privacy from an adversary's view. We develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies.
arXiv Detail & Related papers (2022-06-23T05:24:52Z)
Backdooring Explainable Machine Learning [0.8180960351554997]
We demonstrate blinding attacks that can fully disguise an ongoing attack against the machine learning model. Similar to neural backdoors, we modify the model's prediction upon trigger presence but simultaneously also fool the provided explanation.
arXiv Detail & Related papers (2022-04-20T14:40:09Z)
Exploiting Explanations for Model Inversion Attacks [19.91586648726519]
We study the risk for image-based model inversion attacks with increasing performance to reconstruct private image data from model explanations. We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only. These threats highlight the urgent and significant privacy risks of explanations and calls attention for new privacy preservation techniques.
arXiv Detail & Related papers (2021-04-26T15:53:57Z)
Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.