Continuous Learning for Android Malware Detection
- URL: http://arxiv.org/abs/2302.04332v2
- Date: Wed, 14 Jun 2023 17:23:44 GMT
- Title: Continuous Learning for Android Malware Detection
- Authors: Yizheng Chen, Zhoujie Ding, David Wagner
- Abstract summary: We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier.
Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%).
- Score: 15.818435778629635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning methods can detect Android malware with very high accuracy.
However, these classifiers have an Achilles heel, concept drift: they rapidly
become out of date and ineffective, due to the evolution of malware apps and
benign apps. Our research finds that, after training an Android malware
classifier on one year's worth of data, the F1 score quickly dropped from 0.99
to 0.76 after 6 months of deployment on new test samples.
In this paper, we propose new methods to combat the concept drift problem of
Android malware classifiers. Since machine learning technique needs to be
continuously deployed, we use active learning: we select new samples for
analysts to label, and then add the labeled samples to the training set to
retrain the classifier. Our key idea is, similarity-based uncertainty is more
robust against concept drift. Therefore, we combine contrastive learning with
active learning. We propose a new hierarchical contrastive learning scheme, and
a new sample selection technique to continuously train the Android malware
classifier. Our evaluation shows that this leads to significant improvements,
compared to previously published methods for active learning. Our approach
reduces the false negative rate from 14% (for the best baseline) to 9%, while
also reducing the false positive rate (from 0.86% to 0.48%). Also, our approach
maintains more consistent performance across a seven-year time period than past
methods.
Related papers
- Improving Malware Detection with Adversarial Domain Adaptation and Control Flow Graphs [10.352741619176383]
Existing solutions to combat concept drift use active learning.
We propose a method that learns retained information in malware control flow graphs post-drift by leveraging graph neural network.
Our approach demonstrates a significant enhancement in predicting unseen malware family in a binary classification task and predicting drifted malware families in a multi-class setting.
arXiv Detail & Related papers (2024-07-18T22:06:20Z) - ActDroid: An active learning framework for Android malware detection [3.195234044113248]
A new piece of malware appears online every 12 seconds.
Online learning can be used to mitigate the problem of labelling applications.
Our framework achieves accuracies of up to 96%.
arXiv Detail & Related papers (2024-01-30T13:10:33Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Efficient Concept Drift Handling for Batch Android Malware Detection
Models [0.0]
We show how retraining techniques are able to maintain detector capabilities over time.
Our experiments show that concept drift detection and sample selection mechanisms result in very efficient retraining strategies.
arXiv Detail & Related papers (2023-09-18T14:28:18Z) - Seamless Iterative Semi-Supervised Correction of Imperfect Labels in
Microscopy Images [57.42492501915773]
In-vitro tests are an alternative to animal testing for the toxicity of medical devices.
Human fatigue plays a role in error making, making the use of deep learning appealing.
We propose Seamless Iterative Semi-Supervised correction of Imperfect labels (SISSI)
Our method successfully provides an adaptive early learning correction technique for object detection.
arXiv Detail & Related papers (2022-08-05T18:52:20Z) - A two-steps approach to improve the performance of Android malware
detectors [4.440024971751226]
We propose GUIDED RETRAINING, a supervised representation learning-based method that boosts the performance of a malware detector.
We validate our method on four state-of-the-art Android malware detection approaches using over 265k malware and benign apps.
Our method is generic and designed to enhance the classification performance on a binary classification task.
arXiv Detail & Related papers (2022-05-17T12:04:17Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion.
We show that different types of bots are characterized by different behavioral features.
We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - A Framework for Behavioral Biometric Authentication using Deep Metric
Learning on Mobile Devices [17.905483523678964]
We present a new framework to incorporate training on battery-powered mobile devices, so private data never leaves the device and training can be flexibly scheduled to adapt the behavioral patterns at runtime.
Experiments demonstrate authentication accuracy over 95% on three public datasets, a sheer 15% gain from multi-class classification with less data and robustness against brute-force and side-channel attacks with 99% and 90% success, respectively.
Our results indicate that training consumes lower energy than watching videos and slightly higher energy than playing games.
arXiv Detail & Related papers (2020-05-26T17:56:20Z) - Rethinking Few-Shot Image Classification: a Good Embedding Is All You
Need? [72.00712736992618]
We show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, outperforms state-of-the-art few-shot learning methods.
An additional boost can be achieved through the use of self-distillation.
We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.
arXiv Detail & Related papers (2020-03-25T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.