BERTDetect: A Neural Topic Modelling Approach for Android Malware Detection
- URL: http://arxiv.org/abs/2503.18043v1
- Date: Sun, 23 Mar 2025 12:09:44 GMT
- Title: BERTDetect: A Neural Topic Modelling Approach for Android Malware Detection
- Authors: Nishavi Ranaweera, Jiarui Xu, Suranga Seneviratne, Aruna Seneviratne,
- Abstract summary: Web access today occurs predominantly through mobile devices, with Android representing a significant share of the mobile device market.<n>Despite efforts to combat malicious attacks through tools like Google Play Protect and antivirus software, new and evolved malware continues to infiltrate Android devices.<n>Source code analysis is effective but limited, as attackers quickly abandon old malware for new variants to evade detection.
- Score: 13.387599470973807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Web access today occurs predominantly through mobile devices, with Android representing a significant share of the mobile device market. This widespread usage makes Android a prime target for malicious attacks. Despite efforts to combat malicious attacks through tools like Google Play Protect and antivirus software, new and evolved malware continues to infiltrate Android devices. Source code analysis is effective but limited, as attackers quickly abandon old malware for new variants to evade detection. Therefore, there is a need for alternative methods that complement source code analysis. Prior research investigated clustering applications based on their descriptions and identified outliers in these clusters by API usage as malware. However, these works often used traditional techniques such as Latent Dirichlet Allocation (LDA) and k-means clustering, that do not capture the nuanced semantic structures present in app descriptions. To this end, in this paper, we propose BERTDetect, which leverages the BERTopic neural topic modelling to effectively capture the latent topics in app descriptions. The resulting topic clusters are comparatively more coherent than previous methods and represent the app functionalities well. Our results demonstrate that BERTDetect outperforms other baselines, achieving ~10% relative improvement in F1 score.
Related papers
- CorrNetDroid: Android Malware Detector leveraging a Correlation-based Feature Selection for Network Traffic features [2.9069289358935073]
This work proposes a dynamic analysis-based Android malware detection system, CorrNetDroid, that works over network traffic flows.<n>Many traffic features exhibit overlapping ranges in normal and malware datasets.<n>Our model effectively reduces the feature set while detecting Android malware with 99.50 percent accuracy when considering only two network traffic features.
arXiv Detail & Related papers (2025-03-03T10:52:34Z) - MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - Can you See me? On the Visibility of NOPs against Android Malware Detectors [1.2187048691454239]
This paper proposes a visibility metric that assesses the difficulty in spotting NOPs and similar non-operational codes.
We tested our metric on a state-of-the-art, opcode-based deep learning system for Android malware detection.
arXiv Detail & Related papers (2023-12-28T20:48:16Z) - Overload: Latency Attacks on Object Detection for Edge Devices [47.9744734181236]
This paper investigates latency attacks on deep learning applications.
Unlike common adversarial attacks for misclassification, the goal of latency attacks is to increase the inference time.
We use object detection to demonstrate how such kind of attacks work.
arXiv Detail & Related papers (2023-04-11T17:24:31Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Adversarial Patterns: Building Robust Android Malware Classifiers [0.9208007322096533]
In the field of cybersecurity, machine learning models have made significant improvements in malware detection.
Despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks.
This paper provides a comprehensive review of adversarial machine learning in the context of Android malware classifiers.
arXiv Detail & Related papers (2022-03-04T03:47:08Z) - Graph Neural Network-based Android Malware Classification with Jumping
Knowledge [3.408873763213743]
This paper proposes a GNN-based method for Android malware detection by capturing meaningful intra-procedural call path patterns.
A Jumping-Knowledge technique is applied to minimize the effect of the over-smoothing problem.
The proposed method has been extensively evaluated using two benchmark datasets.
arXiv Detail & Related papers (2022-01-19T11:29:02Z) - EvadeDroid: A Practical Evasion Attack on Machine Learning for Black-box
Android Malware Detection [2.2811510666857546]
EvadeDroid is a problem-space adversarial attack designed to effectively evade black-box Android malware detectors in real-world scenarios.
We show that EvadeDroid achieves evasion rates of 80%-95% against DREBIN, Sec-SVM, ADE-MA, MaMaDroid, and Opcode-SVM with only 1-9 queries.
arXiv Detail & Related papers (2021-10-07T09:39:40Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.