Maat: Automatically Analyzing VirusTotal for Accurate Labeling and
Effective Malware Detection
- URL: http://arxiv.org/abs/2007.00510v1
- Date: Wed, 1 Jul 2020 14:15:03 GMT
- Title: Maat: Automatically Analyzing VirusTotal for Accurate Labeling and
Effective Malware Detection
- Authors: Aleieldin Salem, Sebastian Banescu, Alexander Pretschner
- Abstract summary: The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 scanners.
There are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies.
We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme.
- Score: 71.84087757644708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The malware analysis and detection research community relies on the online
platform VirusTotal to label Android apps based on the scan results of around
60 antiviral scanners. Unfortunately, there are no standards on how to best
interpret the scan results acquired from VirusTotal, which leads to the
utilization of different threshold-based labeling strategies (e.g., if ten or
more scanners deem an app malicious, it is considered malicious). While some of
the utilized thresholds may be able to accurately approximate the ground truths
of apps, the fact that VirusTotal changes the set and versions of the scanners
it uses makes such thresholds unsustainable over time. We implemented a method,
Maat, that tackles these issues of standardization and sustainability by
automatically generating a Machine Learning (ML)-based labeling scheme, which
outperforms threshold-based labeling strategies. Using the VirusTotal scan
reports of 53K Android apps that span one year, we evaluated the applicability
of Maat's ML-based labeling strategies by comparing their performance against
threshold-based strategies. We found that such ML-based strategies (a) can
accurately and consistently label apps based on their VirusTotal scan reports,
and (b) contribute to training ML-based detection methods that are more
effective at classifying out-of-sample apps than their threshold-based
counterparts.
Related papers
- Multi-label Classification for Android Malware Based on Active Learning [7.599125552187342]
We propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors.
We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%).
This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.
arXiv Detail & Related papers (2024-10-09T01:09:24Z) - DetectBERT: Towards Full App-Level Representation Learning to Detect Android Malware [7.818978727292627]
This paper introduces DetectBERT, which integrates correlated Multiple Instance Learning (c-MIL) with DexBERT to handle the high dimensionality and variability of Android malware.
Our evaluation demonstrates that DetectBERT not only surpasses existing state-of-the-art detection methods but also adapts to evolving malware threats.
arXiv Detail & Related papers (2024-08-29T08:47:25Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - A two-steps approach to improve the performance of Android malware
detectors [4.440024971751226]
We propose GUIDED RETRAINING, a supervised representation learning-based method that boosts the performance of a malware detector.
We validate our method on four state-of-the-art Android malware detection approaches using over 265k malware and benign apps.
Our method is generic and designed to enhance the classification performance on a binary classification task.
arXiv Detail & Related papers (2022-05-17T12:04:17Z) - New Datasets for Dynamic Malware Classification [0.0]
We introduce two new, updated datasets of malicious software, VirusSamples and VirusShare.
This paper analyzes multi-class malware classification performance of the balanced and imbalanced version of these two datasets.
Results show that Support Vector Machine, achieves the highest score of 94% in the imbalanced VirusSample dataset.
XGBoost, one of the most common gradient boosting-based models, achieves the highest score of 90% and 80%.in both versions of the VirusShare dataset.
arXiv Detail & Related papers (2021-11-30T08:31:16Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Towards Accurate Labeling of Android Apps for Reliable Malware Detection [0.0]
Researchers rely on threshold-based labeling strategies that interpret the scan reports provided by online platforms, such as VirusTotal.
The dynamicity of this platform renders those labeling strategies unsustainable over prolonged periods, which leads to inaccurate labels.
The infeasibility of generating accurate labels via manual analysis and the lack of reliable alternatives force researchers to utilize VirusTotal to label apps.
arXiv Detail & Related papers (2020-07-01T13:02:19Z) - Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals.
We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.