Why an Android App is Classified as Malware? Towards Malware
Classification Interpretation
- URL: http://arxiv.org/abs/2004.11516v2
- Date: Fri, 4 Sep 2020 13:27:46 GMT
- Title: Why an Android App is Classified as Malware? Towards Malware
Classification Interpretation
- Authors: Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen,
Michael R. Lyu
- Abstract summary: We propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result.
XMal hinges multi-layer perceptron (MLP) and attention mechanism, and also pinpoints the key features most related to the classification result.
Our study peeks into the interpretable ML through the research of Android malware detection and analysis.
- Score: 34.59397128785141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) based approach is considered as one of the most
promising techniques for Android malware detection and has achieved high
accuracy by leveraging commonly-used features. In practice, most of the ML
classifications only provide a binary label to mobile users and app security
analysts. However, stakeholders are more interested in the reason why apps are
classified as malicious in both academia and industry. This belongs to the
research area of interpretable ML but in a specific research domain (i.e.,
mobile malware detection). Although several interpretable ML methods have been
exhibited to explain the final classification results in many cutting-edge
Artificial Intelligent (AI) based research fields, till now, there is no study
interpreting why an app is classified as malware or unveiling the
domain-specific challenges.
In this paper, to fill this gap, we propose a novel and interpretable
ML-based approach (named XMal) to classify malware with high accuracy and
explain the classification result meanwhile. (1) The first classification phase
of XMal hinges multi-layer perceptron (MLP) and attention mechanism, and also
pinpoints the key features most related to the classification result. (2) The
second interpreting phase aims at automatically producing neural language
descriptions to interpret the core malicious behaviors within apps. We evaluate
the behavior description results by comparing with the existing interpretable
ML-based methods (i.e., Drebin and LIME) to demonstrate the effectiveness of
XMal. We find that XMal is able to reveal the malicious behaviors more
accurately. Additionally, our experiments show that XMal can also interpret the
reason why some samples are misclassified by ML classifiers. Our study peeks
into the interpretable ML through the research of Android malware detection and
analysis.
Related papers
- Multi-label Classification for Android Malware Based on Active Learning [7.599125552187342]
We propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors.
We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%).
This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.
arXiv Detail & Related papers (2024-10-09T01:09:24Z) - MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? [70.77691645678804]
Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli.
This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies.
We identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive.
arXiv Detail & Related papers (2024-06-22T23:26:07Z) - CausalGym: Benchmarking causal interpretability methods on linguistic
tasks [52.61917615039112]
We use CausalGym to benchmark the ability of interpretability methods to causally affect model behaviour.
We study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods.
We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena.
arXiv Detail & Related papers (2024-02-19T21:35:56Z) - Unraveling the Key of Machine Learning Solutions for Android Malware
Detection [33.63795751798441]
This paper presents a comprehensive investigation into machine learning-based Android malware detection.
We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline.
Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e. effectiveness, robustness, and efficiency.
arXiv Detail & Related papers (2024-02-05T12:31:19Z) - Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid.
adversarial distortion injected into the power signal will greatly affect the system's normal control and operation.
It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z) - Quantum Machine Learning for Malware Classification [0.0]
In a context of malicious software detection, machine learning is widely used to generalize to new malware.
It has been demonstrated that ML models can be fooled or may have generalization problems on malware that has never been seen.
We implement two models of Quantum Machine Learning algorithms, and we compare them to classical models for the classification of a dataset composed of malicious and benign executable files.
arXiv Detail & Related papers (2023-05-09T09:21:48Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - MERLIN -- Malware Evasion with Reinforcement LearnINg [26.500149465292246]
We propose a method using reinforcement learning with DQN and REINFORCE algorithms to challenge two state-of-the-art malware detection engines.
Our method combines several actions, modifying a Windows portable execution file without breaking its functionalities.
We demonstrate that REINFORCE achieves very good evasion rates even on a commercial AV with limited available information.
arXiv Detail & Related papers (2022-03-24T10:58:47Z) - Towards interpreting ML-based automated malware detection models: a
survey [4.721069729610892]
Most of the existing machine learning models are black-box, which made their pre-diction results undependable.
This paper aims to examine and categorize the existing researches on ML-based malware detector interpretability.
arXiv Detail & Related papers (2021-01-15T17:34:40Z) - Maat: Automatically Analyzing VirusTotal for Accurate Labeling and
Effective Malware Detection [71.84087757644708]
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 scanners.
There are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies.
We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme.
arXiv Detail & Related papers (2020-07-01T14:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.