Explanation-Guided Diagnosis of Machine Learning Evasion Attacks
- URL: http://arxiv.org/abs/2106.15820v1
- Date: Wed, 30 Jun 2021 05:47:12 GMT
- Title: Explanation-Guided Diagnosis of Machine Learning Evasion Attacks
- Authors: Abderrahmen Amich, Birhanu Eshete
- Abstract summary: We introduce a novel framework that harnesses explainable ML methods to guide high-fidelity assessment of ML evasion attacks.
Our framework enables explanation-guided correlation analysis between pre-evasion perturbations and post-evasion explanations.
- Score: 3.822543555265593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) models are susceptible to evasion attacks. Evasion
accuracy is typically assessed using aggregate evasion rate, and it is an open
question whether aggregate evasion rate enables feature-level diagnosis on the
effect of adversarial perturbations on evasive predictions. In this paper, we
introduce a novel framework that harnesses explainable ML methods to guide
high-fidelity assessment of ML evasion attacks. Our framework enables
explanation-guided correlation analysis between pre-evasion perturbations and
post-evasion explanations. Towards systematic assessment of ML evasion attacks,
we propose and evaluate a novel suite of model-agnostic metrics for
sample-level and dataset-level correlation analysis. Using malware and image
classifiers, we conduct comprehensive evaluations across diverse model
architectures and complementary feature representations. Our explanation-guided
correlation analysis reveals correlation gaps between adversarial samples and
the corresponding perturbations performed on them. Using a case study on
explanation-guided evasion, we show the broader usage of our methodology for
assessing robustness of ML models.
Related papers
- Revisiting Spurious Correlation in Domain Generalization [12.745076668687748]
We build a structural causal model (SCM) to describe the causality within data generation process.
We further conduct a thorough analysis of the mechanisms underlying spurious correlation.
In this regard, we propose to control confounding bias in OOD generalization by introducing a propensity score weighted estimator.
arXiv Detail & Related papers (2024-06-17T13:22:00Z) - Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks [8.06071340190569]
Black-box machine learning (ML) models can be exploited to carry out privacy threats such as membership inference attacks (MIA)
Existing works have only analyzed MIA in a single "what if" interaction scenario between an adversary and the target ML model.
We propose a sound mathematical formulation to prove that such an optimal threshold exists, which can be used to launch MIA.
arXiv Detail & Related papers (2024-04-10T16:14:05Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data.
Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples.
Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Balancing detectability and performance of attacks on the control
channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs)
This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - EG-Booster: Explanation-Guided Booster of ML Evasion Attacks [3.822543555265593]
We present a novel approach called EG-Booster that leverages techniques from explainable ML to guide adversarial example crafting.
EG-Booster is agnostic to model architecture, threat model, and supports diverse distance metrics used previously in the literature.
Our findings suggest that EG-Booster significantly improves evasion rate of state-of-the-art attacks while performing less number of perturbations.
arXiv Detail & Related papers (2021-08-31T15:36:16Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.