Towards interpreting ML-based automated malware detection models: a
survey
- URL: http://arxiv.org/abs/2101.06232v1
- Date: Fri, 15 Jan 2021 17:34:40 GMT
- Title: Towards interpreting ML-based automated malware detection models: a
survey
- Authors: Yuzhou Lin, Xiaolin Chang
- Abstract summary: Most of the existing machine learning models are black-box, which made their pre-diction results undependable.
This paper aims to examine and categorize the existing researches on ML-based malware detector interpretability.
- Score: 4.721069729610892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Malware is being increasingly threatening and malware detectors based on
traditional signature-based analysis are no longer suitable for current malware
detection. Recently, the models based on machine learning (ML) are developed
for predicting unknown malware variants and saving human strength. However,
most of the existing ML models are black-box, which made their pre-diction
results undependable, and therefore need further interpretation in order to be
effectively deployed in the wild. This paper aims to examine and categorize the
existing researches on ML-based malware detector interpretability. We first
give a detailed comparison over the previous work on common ML model
inter-pretability in groups after introducing the principles, attributes,
evaluation indi-cators and taxonomy of common ML interpretability. Then we
investigate the interpretation methods towards malware detection, by addressing
the importance of interpreting malware detectors, challenges faced by this
field, solutions for migitating these challenges, and a new taxonomy for
classifying all the state-of-the-art malware detection interpretability work in
recent years. The highlight of our survey is providing a new taxonomy towards
malware detection interpreta-tion methods based on the common taxonomy
summarized by previous re-searches in the common field. In addition, we are the
first to evaluate the state-of-the-art approaches by interpretation method
attributes to generate the final score so as to give insight to quantifying the
interpretability. By concluding the results of the recent researches, we hope
our work can provide suggestions for researchers who are interested in the
interpretability on ML-based malware de-tection models.
Related papers
- Explainable Malware Detection with Tailored Logic Explained Networks [9.506820624395447]
Malware detection is a constant challenge in cybersecurity due to the rapid development of new attack techniques.
Traditional signature-based approaches struggle to keep pace with the sheer volume of malware samples.
Machine learning offers a promising solution, but faces issues of generalization to unseen samples and a lack of explanation for the instances identified as malware.
arXiv Detail & Related papers (2024-05-05T17:36:02Z) - Unraveling the Key of Machine Learning Solutions for Android Malware
Detection [33.63795751798441]
This paper presents a comprehensive investigation into machine learning-based Android malware detection.
We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline.
Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e. effectiveness, robustness, and efficiency.
arXiv Detail & Related papers (2024-02-05T12:31:19Z) - Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid.
adversarial distortion injected into the power signal will greatly affect the system's normal control and operation.
It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - A Survey on Malware Detection with Graph Representation Learning [0.0]
Malware detection has become a major concern due to the increasing number and complexity of malware.
In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data.
This paper provides an in-depth literature review to summarize and unify existing works under the common approaches and architectures.
arXiv Detail & Related papers (2023-03-28T14:27:08Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - Active Surrogate Estimators: An Active Learning Approach to
Label-Efficient Model Evaluation [59.7305309038676]
We propose Active Surrogate Estimators (ASEs) for model evaluation.
We find that ASEs offer greater label-efficiency than the current state-of-the-art.
arXiv Detail & Related papers (2022-02-14T17:15:18Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z) - Explainable Matrix -- Visualization for Global and Local
Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability.
It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates.
ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z) - Why an Android App is Classified as Malware? Towards Malware
Classification Interpretation [34.59397128785141]
We propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result.
XMal hinges multi-layer perceptron (MLP) and attention mechanism, and also pinpoints the key features most related to the classification result.
Our study peeks into the interpretable ML through the research of Android malware detection and analysis.
arXiv Detail & Related papers (2020-04-24T03:05:09Z) - Interpreting Machine Learning Malware Detectors Which Leverage N-gram
Analysis [2.6397379133308214]
cybersecurity analysts always prefer solutions that are as interpretable and understandable as rule-based or signature-based detection.
The objective of this paper is to evaluate the current state-of-the-art ML models interpretability techniques when applied to ML-based malware detectors.
arXiv Detail & Related papers (2020-01-27T19:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.