Related papers: Towards Accurate Labeling of Android Apps for Reliable Malware Detection

Towards Accurate Labeling of Android Apps for Reliable Malware Detection

URL: http://arxiv.org/abs/2007.00464v1
Date: Wed, 1 Jul 2020 13:02:19 GMT
Title: Towards Accurate Labeling of Android Apps for Reliable Malware Detection
Authors: Aleieldin Salem
Abstract summary: Researchers rely on threshold-based labeling strategies that interpret the scan reports provided by online platforms, such as VirusTotal. The dynamicity of this platform renders those labeling strategies unsustainable over prolonged periods, which leads to inaccurate labels. The infeasibility of generating accurate labels via manual analysis and the lack of reliable alternatives force researchers to utilize VirusTotal to label apps.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In training their newly-developed malware detection methods, researchers rely on threshold-based labeling strategies that interpret the scan reports provided by online platforms, such as VirusTotal. The dynamicity of this platform renders those labeling strategies unsustainable over prolonged periods, which leads to inaccurate labels. Using inaccurately labeled apps to train and evaluate malware detection methods significantly undermines the reliability of their results, leading to either dismissing otherwise promising detection approaches or adopting intrinsically inadequate ones. The infeasibility of generating accurate labels via manual analysis and the lack of reliable alternatives force researchers to utilize VirusTotal to label apps. In the paper, we tackle this issue in two manners. Firstly, we reveal the aspects of VirusTotal's dynamicity and how they impact threshold-based labeling strategies and provide actionable insights on how to use these labeling strategies given VirusTotal's dynamicity reliably. Secondly, we motivate the implementation of alternative platforms by (a) identifying VirusTotal limitations that such platforms should avoid, and (b) proposing an architecture of how such platforms can be constructed to mitigate VirusTotal's limitations.

Related papers

Measuring and Explaining the Effects of Android App Transformations in Online Malware Detection [19.35985745898256]
We propose a data-driven approach to measure the effect of app transformations to malware detection.<n>Six app transformation techniques are implemented in order to generate a large number of Android apps with traceable changes.<n>Last, we conduct a comprehensive analysis of antivirus engines based on the perspectives of signature-based, static analysis-based, and dynamic analysis-based detection techniques.
arXiv Detail & Related papers (2025-07-27T17:26:50Z)
Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection [11.53708766953391]
We introduce label spoofing attacks, a new threat that contaminates crowd-sourced datasets by embedding minimal and undetectable malicious patterns. We demonstrate this scenario by developing AndroVenom, a methodology for polluting realistic data sources. Experiments show that not only state-of-the-art feature extractors are unable to filter such injection, but also various ML models experience Denial of Service.
arXiv Detail & Related papers (2025-03-14T20:05:56Z)
Unveiling Malware Patterns: A Self-analysis Perspective [15.517313565392852]
VisUnpack is a static analysis-based data visualization framework for bolstering attack prevention and aiding recovery post-attack. Our method includes unpacking packed malware programs, calculating local similarity descriptors based on basic blocks, enhancing correlations between descriptors, and refining them by minimizing noises. Our comprehensive evaluation of VisUnpack based on a freshly gathered dataset with over 27,106 samples confirms its capability in accurately classifying malware programs with a precision of 99.7%.
arXiv Detail & Related papers (2025-01-10T16:04:13Z)
Multi-label Classification for Android Malware Based on Active Learning [7.599125552187342]
We propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors. We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%). This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.
arXiv Detail & Related papers (2024-10-09T01:09:24Z)
Towards Novel Malicious Packet Recognition: A Few-Shot Learning Approach [0.0]
Deep Packet Inspection (DPI) has emerged as a key technology in strengthening network security. This study proposes a novel approach that leverages a large language model (LLM) and few-shot learning. Our approach shows promising results with an average accuracy of 86.35% and F1-Score of 86.40% on different malware types.
arXiv Detail & Related papers (2024-09-17T15:02:32Z)
Android Malware Detection with Unbiased Confidence Guarantees [1.6432632226868131]
We propose a machine learning dynamic analysis approach that provides provably valid confidence guarantees in each malware detection. The proposed approach is based on a novel machine learning framework, called Conformal Prediction, combined with a random forests classifier. We examine its performance on a large-scale dataset collected by installing 1866 malicious and 4816 benign applications on a real android device.
arXiv Detail & Related papers (2023-12-17T11:07:31Z)
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels. Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels. We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z)
Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework. We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models. We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z)
New Datasets for Dynamic Malware Classification [0.0]
We introduce two new, updated datasets of malicious software, VirusSamples and VirusShare. This paper analyzes multi-class malware classification performance of the balanced and imbalanced version of these two datasets. Results show that Support Vector Machine, achieves the highest score of 94% in the imbalanced VirusSample dataset. XGBoost, one of the most common gradient boosting-based models, achieves the highest score of 90% and 80%.in both versions of the VirusShare dataset.
arXiv Detail & Related papers (2021-11-30T08:31:16Z)
Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector. Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection [71.84087757644708]
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 scanners. There are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies. We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme.
arXiv Detail & Related papers (2020-07-01T14:15:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.