Related papers: Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors

Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors

URL: http://arxiv.org/abs/2205.12569v1
Date: Wed, 25 May 2022 08:28:08 GMT
Title: Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors
Authors: Borja Molina-Coronado and Usue Mori and Alexander Mendiburu and Jose Miguel-Alonso
Abstract summary: We analyze 10 influential research works on Android malware detection using a common evaluation framework. We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models. We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
Score: 63.75363908696257
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As in other cybersecurity areas, machine learning (ML) techniques have emerged as a promising solution to detect Android malware. In this sense, many proposals employing a variety of algorithms and feature sets have been presented to date, often reporting impresive detection performances. However, the lack of reproducibility and the absence of a standard evaluation framework make these proposals difficult to compare. In this paper, we perform an analysis of 10 influential research works on Android malware detection using a common evaluation framework. We have identified five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models and their performances. In particular, we analyze the effect of (1) the presence of duplicated samples, (2) label (goodware/greyware/malware) attribution, (3) class imbalance, (4) the presence of apps that use evasion techniques and, (5) the evolution of apps. Based on this extensive experimentation, we conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results. Our findings also highlight that it is imperative to generate realistic datasets, taking into account the factors mentioned above, to enable the design and evaluation of better solutions for Android malware detection.

Related papers

Measuring and Explaining the Effects of Android App Transformations in Online Malware Detection [19.35985745898256]
We propose a data-driven approach to measure the effect of app transformations to malware detection.<n>Six app transformation techniques are implemented in order to generate a large number of Android apps with traceable changes.<n>Last, we conduct a comprehensive analysis of antivirus engines based on the perspectives of signature-based, static analysis-based, and dynamic analysis-based detection techniques.
arXiv Detail & Related papers (2025-07-27T17:26:50Z)
Breaking Out from the TESSERACT: Reassessing ML-based Malware Detection under Spatio-Temporal Drift [13.730284868830584]
We show striking discrepancies in the performance of learning-based malware detection across the same time frame.<n>We identify five novel temporal and spatial bias factors that affect realistic evaluations.
arXiv Detail & Related papers (2025-06-30T13:01:24Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Revisiting Static Feature-Based Android Malware Detection [0.8192907805418583]
This paper highlights critical pitfalls that undermine the validity of machine learning research in Android malware detection. We propose solutions for improving datasets and methodological practices, enabling fairer model comparisons. Our paper aims to support future research in Android malware detection and other security domains, enhancing the reliability and validity of published results.
arXiv Detail & Related papers (2024-09-11T16:37:50Z)
PromptSAM+: Malware Detection based on Prompt Segment Anything Model [8.00932560688061]
We propose a visual malware general enhancement classification framework, PromptSAM+', based on a large visual network segmentation model. Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives.
arXiv Detail & Related papers (2024-08-04T15:42:34Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Unraveling the Key of Machine Learning Solutions for Android Malware Detection [33.63795751798441]
This paper presents a comprehensive investigation into machine learning-based Android malware detection. We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline. Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e. effectiveness, robustness, and efficiency.
arXiv Detail & Related papers (2024-02-05T12:31:19Z)
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines. Academic research is often restrained to public datasets on the order of ten thousand samples. We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z)
Malicious code detection in android: the role of sequence characteristics and disassembling methods [0.0]
We investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers. Our findings exhibit that the disassembly method and different input representations affect the model results.
arXiv Detail & Related papers (2023-12-02T11:55:05Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
A Review on the effectiveness of Dimensional Reduction with Computational Forensics: An Application on Malware Analysis [0.0]
We evaluate the effectiveness of the application of Principle Component Analysis on Computational Forensics task of detecting Android based malware. Our research result showed that the dimensionally reduced dataset would result in a measure of degradation in accuracy performance.
arXiv Detail & Related papers (2023-01-15T07:34:31Z)
Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias. A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.