ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection
- URL: http://arxiv.org/abs/2507.08597v2
- Date: Sun, 03 Aug 2025 17:46:40 GMT
- Title: ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection
- Authors: Md Tanvirul Alam, Aritran Piplai, Nidhi Rastogi,
- Abstract summary: Adapting machine learning models to changing data distributions requires frequent updates.<n>We introduce texttADAPT, a novel pseudo-labeling semi-supervised algorithm for addressing concept drift.
- Score: 0.8192907805418583
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Machine learning models are commonly used for malware classification; however, they suffer from performance degradation over time due to concept drift. Adapting these models to changing data distributions requires frequent updates, which rely on costly ground truth annotations. While active learning can reduce the annotation burden, leveraging unlabeled data through semi-supervised learning remains a relatively underexplored approach in the context of malware detection. In this research, we introduce \texttt{ADAPT}, a novel pseudo-labeling semi-supervised algorithm for addressing concept drift. Our model-agnostic method can be applied to various machine learning models, including neural networks and tree-based algorithms. We conduct extensive experiments on five diverse malware detection datasets spanning Android, Windows, and PDF domains. The results demonstrate that our method consistently outperforms baseline models and competitive benchmarks. This work paves the way for more effective adaptation of machine learning models to concept drift in malware detection.
Related papers
- Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection [10.268191178804168]
This study examines the impact of concept drift on Android malware detection.<n>Factors influencing the drift include feature types, data environments, and detection methods.<n>No strong link was found between the type of algorithm used and concept drift.
arXiv Detail & Related papers (2025-07-30T15:35:51Z) - Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Malware with Minimal Samples [10.352741619176383]
We propose a new technique for detecting and classifying drifted malware.<n>It learns drift-invariant features in malware control flow graphs by leveraging graph neural networks with adversarial domain adaptation.<n>Our approach significantly improves drifted malware detection on publicly available benchmarks and real-world malware databases reported daily by security companies.
arXiv Detail & Related papers (2024-07-18T22:06:20Z) - Combating Concept Drift with Explanatory Detection and Adaptation for Android Malware Classification [17.399454244765842]
DREAM is a novel system that improves drift detection and establishes an explanatory adaptation process.<n>Our evaluation shows that DREAM effectively improves the drift detection accuracy and reduces the expert analysis effort in adaptation.
arXiv Detail & Related papers (2024-05-07T07:55:45Z) - MORPH: Towards Automated Concept Drift Adaptation for Malware Detection [0.7499722271664147]
Concept drift is a significant challenge for malware detection.
Self-training has emerged as a promising approach to mitigate concept drift.
We propose MORPH -- an effective pseudo-label-based concept drift adaptation method.
arXiv Detail & Related papers (2024-01-23T14:25:43Z) - Optimized Deep Learning Models for Malware Detection under Concept Drift [0.0]
We propose a model-agnostic protocol to improve a baseline neural network against drift.
We show the importance of feature reduction and training with the most recent validation set possible, and propose a loss function named Drift-Resilient Binary Cross-Entropy.
Our improved model shows promising results, detecting 15.2% more malware than a baseline model.
arXiv Detail & Related papers (2023-08-21T16:13:23Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.