PhilaeX: Explaining the Failure and Success of AI Models in Malware
Detection
- URL: http://arxiv.org/abs/2207.00740v1
- Date: Sat, 2 Jul 2022 05:06:24 GMT
- Title: PhilaeX: Explaining the Failure and Success of AI Models in Malware
Detection
- Authors: Zhi Lu, Vrizlynn L. L. Thing
- Abstract summary: An explanation to an AI model's prediction used to support decision making in cyber security, is of critical importance.
Most existing AI models lack the ability to provide explanations on their prediction results, despite their strong performance in most scenarios.
We propose a novel explainable AI method, called PhilaeX, that provides the means to identify the optimized subset of features to form the complete explanations of AI models' predictions.
- Score: 6.264663726458324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The explanation to an AI model's prediction used to support decision making
in cyber security, is of critical importance. It is especially so when the
model's incorrect prediction can lead to severe damages or even losses to lives
and critical assets. However, most existing AI models lack the ability to
provide explanations on their prediction results, despite their strong
performance in most scenarios. In this work, we propose a novel explainable AI
method, called PhilaeX, that provides the heuristic means to identify the
optimized subset of features to form the complete explanations of AI models'
predictions. It identifies the features that lead to the model's borderline
prediction, and those with positive individual contributions are extracted. The
feature attributions are then quantified through the optimization of a Ridge
regression model. We verify the explanation fidelity through two experiments.
First, we assess our method's capability in correctly identifying the activated
features in the adversarial samples of Android malwares, through the features
attribution values from PhilaeX. Second, the deduction and augmentation tests,
are used to assess the fidelity of the explanations. The results show that
PhilaeX is able to explain different types of classifiers correctly, with
higher fidelity explanations, compared to the state-of-the-arts methods such as
LIME and SHAP.
Related papers
- F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI [15.314388210699443]
Fine-tuned Fidelity F-Fidelity is a robust evaluation framework for XAI.
We show that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of explainers.
We also show that given a faithful explainer, F-Fidelity metric can be used to compute the sparsity of influential input components.
arXiv Detail & Related papers (2024-10-03T20:23:06Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models [1.8752655643513647]
XAI tools can increase the vulnerability of model extraction attacks, which is a concern when model owners prefer black-box access.
We propose a novel retraining (learning) based model extraction attack framework against interpretable models under black-box settings.
We show that AUTOLYCUS is highly effective, requiring significantly fewer queries compared to state-of-the-art attacks.
arXiv Detail & Related papers (2023-02-04T13:23:39Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - "How Does It Detect A Malicious App?" Explaining the Predictions of
AI-based Android Malware Detector [6.027885037254337]
We present a novel model-agnostic explanation method for AI models applied for Android malware detection.
Our proposed method identifies and quantifies the data features relevance to the predictions by two steps.
We firstly demonstrate that our proposed model explanation method can aid in discovering how AI models are evaded by adversarial samples quantitatively.
arXiv Detail & Related papers (2021-11-06T11:25:24Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Learning from the Best: Rationalizing Prediction by Adversarial
Information Calibration [39.685626118667074]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial-based technique to calibrate the information extracted by the two models.
For natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales.
arXiv Detail & Related papers (2020-12-16T11:54:15Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.