Exploiting Explanations for Model Inversion Attacks
- URL: http://arxiv.org/abs/2104.12669v1
- Date: Mon, 26 Apr 2021 15:53:57 GMT
- Title: Exploiting Explanations for Model Inversion Attacks
- Authors: Xuejun Zhao, Wencan Zhang, Xiaokui Xiao, Brian Y. Lim
- Abstract summary: We study the risk for image-based model inversion attacks with increasing performance to reconstruct private image data from model explanations.
We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only.
These threats highlight the urgent and significant privacy risks of explanations and calls attention for new privacy preservation techniques.
- Score: 19.91586648726519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The successful deployment of artificial intelligence (AI) in many domains
from healthcare to hiring requires their responsible use, particularly in model
explanations and privacy. Explainable artificial intelligence (XAI) provides
more information to help users to understand model decisions, yet this
additional knowledge exposes additional risks for privacy attacks. Hence,
providing explanation harms privacy. We study this risk for image-based model
inversion attacks and identified several attack architectures with increasing
performance to reconstruct private image data from model explanations. We have
developed several multi-modal transposed CNN architectures that achieve
significantly higher inversion performance than using the target model
prediction only. These XAI-aware inversion models were designed to exploit the
spatial knowledge in image explanations. To understand which explanations have
higher privacy risk, we analyzed how various explanation types and factors
influence inversion performance. In spite of some models not providing
explanations, we further demonstrate increased inversion performance even for
non-explainable target models by exploiting explanations of surrogate models
through attention transfer. This method first inverts an explanation from the
target prediction, then reconstructs the target image. These threats highlight
the urgent and significant privacy risks of explanations and calls attention
for new privacy preservation techniques that balance the dual-requirement for
AI explainability and privacy.
Related papers
- Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference [26.596877194118278]
We present two new membership inference attacks based on feature attribution explanations.
We find that optimized differentially private fine-tuning substantially diminishes the success of the aforementioned attacks.
arXiv Detail & Related papers (2024-07-24T22:16:37Z) - Privacy Implications of Explainable AI in Data-Driven Systems [0.0]
Machine learning (ML) models suffer from a lack of interpretability.
The absence of transparency, often referred to as the black box nature of ML models, undermines trust.
XAI techniques address this challenge by providing frameworks and methods to explain the internal decision-making processes.
arXiv Detail & Related papers (2024-06-22T08:51:58Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Reconciling AI Performance and Data Reconstruction Resilience for
Medical Imaging [52.578054703818125]
Artificial Intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive.
Differential Privacy (DP) aims to circumvent these susceptibilities by setting a quantifiable privacy budget.
We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible.
arXiv Detail & Related papers (2023-12-05T12:21:30Z) - Black-box Attacks on Image Activity Prediction and its Natural Language
Explanations [27.301741710016223]
Explainable AI (XAI) methods aim to describe the decision process of deep neural networks.
Visual XAI methods have been shown to be vulnerable to white-box and gray-box adversarial attacks.
We show that we can create adversarial images that manipulate the explanations of an activity recognition model by having access only to its final output.
arXiv Detail & Related papers (2023-09-30T21:56:43Z) - Data Forensics in Diffusion Models: A Systematic Analysis of Membership
Privacy [62.16582309504159]
We develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario.
Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios.
arXiv Detail & Related papers (2023-02-15T17:37:49Z) - XRand: Differentially Private Defense against Explanation-Guided Attacks [19.682368614810756]
We introduce a new concept of achieving local differential privacy (LDP) in the explanations.
We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.
arXiv Detail & Related papers (2022-12-08T18:23:59Z) - The privacy issue of counterfactual explanations: explanation linkage
attacks [0.0]
We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations.
To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations.
Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.
arXiv Detail & Related papers (2022-10-21T15:44:19Z) - Towards Understanding and Boosting Adversarial Transferability from a
Distribution Perspective [80.02256726279451]
adversarial attacks against Deep neural networks (DNNs) have received broad attention in recent years.
We propose a novel method that crafts adversarial examples by manipulating the distribution of the image.
Our method can significantly improve the transferability of the crafted attacks and achieves state-of-the-art performance in both untargeted and targeted scenarios.
arXiv Detail & Related papers (2022-10-09T09:58:51Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.