Fast & Furious: Modelling Malware Detection as Evolving Data Streams
- URL: http://arxiv.org/abs/2205.12311v1
- Date: Tue, 24 May 2022 18:43:40 GMT
- Title: Fast & Furious: Modelling Malware Detection as Evolving Data Streams
- Authors: Fabr\'icio Ceschin, Marcus Botacin, Heitor Murilo Gomes, Felipe
Pinag\'e, Luiz S. Oliveira, Andr\'e Gr\'egio
- Abstract summary: Malware is a major threat to computer systems and imposes many challenges to cyber security.
In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets.
- Score: 6.6892028759947175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Malware is a major threat to computer systems and imposes many challenges to
cyber security. Targeted threats, such as ransomware, cause millions of dollars
in losses every year. The constant increase of malware infections has been
motivating popular antiviruses (AVs) to develop dedicated detection strategies,
which include meticulously crafted machine learning (ML) pipelines. However,
malware developers unceasingly change their samples features to bypass
detection. This constant evolution of malware samples causes changes to the
data distribution (i.e., concept drifts) that directly affect ML model
detection rates. In this work, we evaluate the impact of concept drift on
malware classifiers for two Android datasets: DREBIN (~130K apps) and AndroZoo
(~350K apps). Android is a ubiquitous operating system for smartphones, which
stimulates attackers to regularly create and update malware to the platform. We
conducted a longitudinal evaluation by (i) classifying malware samples
collected over nine years (2009-2018), (ii) reviewing concept drift detection
algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches
to mitigate the issue, and (iv) proposing an ML data stream pipeline that
outperformed literature approaches. As a result, we observed that updating
every component of the pipeline in response to concept drifts allows the
classification model to achieve increasing detection rates as the data
representation (extracted features) is updated. Furthermore, we discuss the
impact of the changes on the classification models by comparing the variations
in the extracted features.
Related papers
- MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - MalPurifier: Enhancing Android Malware Detection with Adversarial
Purification against Evasion Attacks [19.68134775248897]
MalPurifier exploits adversarial purification to eliminate perturbations independently, resulting in attack mitigation in a light and flexible way.
Experimental results on two Android malware datasets demonstrate that MalPurifier outperforms the state-of-the-art defenses.
arXiv Detail & Related papers (2023-12-11T14:48:43Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Behavioural Reports of Multi-Stage Malware [3.64414368529873]
This dataset provides API call sequences for thousands of malware samples executed in Windows 10 virtual machines.
A tutorial on how to create and expand this dataset is provided along with a benchmark demonstrating how to use this dataset to classify malware.
arXiv Detail & Related papers (2023-01-30T11:51:02Z) - Flexible Android Malware Detection Model based on Generative Adversarial
Networks with Code Tensor [7.417407987122394]
Existing malware detection methods only target at the existing malicious samples.
In this paper, we propose a novel scheme that detects malware and its variants efficiently.
arXiv Detail & Related papers (2022-10-25T03:20:34Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - MaMaDroid2.0 -- The Holes of Control Flow Graphs [5.838266102141281]
This paper fully inspects a well-known Android malware detection system, MaMaDroid, which analyzes the control flow graph of the application.
The changes in the ratio between benign and malicious samples have a clear effect on each one of the models, resulting in a decrease of more than 40% in their detection rate.
Three novel attacks that manipulate the CFG and their detection rates are described for each one of the targeted models.
The attacks decrease the detection rate of most of the models to 0%, with regards to different ratios of benign to malicious apps.
arXiv Detail & Related papers (2022-02-28T16:18:15Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - MDEA: Malware Detection with Evolutionary Adversarial Learning [16.8615211682877]
MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks.
By retraining the model with the evolved malware samples, its performance improves a significant margin.
arXiv Detail & Related papers (2020-02-09T09:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.