Related papers: Learning Temporal Invariance in Android Malware Detectors

Learning Temporal Invariance in Android Malware Detectors

URL: http://arxiv.org/abs/2502.05098v1
Date: Fri, 07 Feb 2025 17:17:42 GMT
Title: Learning Temporal Invariance in Android Malware Detectors
Authors: Xinran Zheng, Shuo Yang, Edith C. H. Ngai, Suman Jana, Lorenzo Cavallaro,
Abstract summary: Learning-based Android malware detectors degrade over time due to natural distribution drift caused by malware variants and new families.<n>This paper systematically investigates the challenges minimizations trained with empirical risk (ERM) face against such distribution shifts.<n>We propose the first temporal invariant training framework for malware detection, which aims to enhance the ability of detectors to learn stable representations across time.
Score: 20.830702588122463
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Learning-based Android malware detectors degrade over time due to natural distribution drift caused by malware variants and new families. This paper systematically investigates the challenges classifiers trained with empirical risk minimization (ERM) face against such distribution shifts and attributes their shortcomings to their inability to learn stable discriminative features. Invariant learning theory offers a promising solution by encouraging models to generate stable representations crossing environments that expose the instability of the training set. However, the lack of prior environment labels, the diversity of drift factors, and low-quality representations caused by diverse families make this task challenging. To address these issues, we propose TIF, the first temporal invariant training framework for malware detection, which aims to enhance the ability of detectors to learn stable representations across time. TIF organizes environments based on application observation dates to reveal temporal drift, integrating specialized multi-proxy contrastive learning and invariant gradient alignment to generate and align environments with high-quality, stable representations. TIF can be seamlessly integrated into any learning-based detector. Experiments on a decade-long dataset show that TIF excels, particularly in early deployment stages, addressing real-world needs and outperforming state-of-the-art methods.

Related papers

Adapting to Fragmented and Evolving Data: A Fisher Information Perspective [0.0]
FADE is a lightweight framework for robust learning under dynamic environments.<n>It employs a shift-aware regularization mechanism anchored in Fisher information geometry.<n>FADE operates online with fixed memory and no access to target labels.
arXiv Detail & Related papers (2025-07-25T06:50:09Z)
Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization [23.328511708942045]
Heterogeneity-aware Distributional Framework (HDF) designed to enhance time-frequency modeling and mitigate imbalance caused by hard samples.<n>Time-Frequency Distributional Attention Module (DAM) captures both temporal consistency and frequency robustness.<n> adaptive optimization module Distribution-aware Scaling Module (DSM) introduced to dynamically balance classification and contrastive losses.
arXiv Detail & Related papers (2025-07-21T16:21:47Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning [33.68869067717862]
Time-series forecasting (TSF) finds broad applications in real-world scenarios. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We propose FOIL, a model-agnostic framework that enables timeseries Forecasting for Out-of-distribution generalization via Invariant Learning.
arXiv Detail & Related papers (2024-06-13T14:01:34Z)
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation [91.83820250747935]
Pseudo-label noise is mainly contained in unstable samples in which predictions of most pixels undergo significant variations during self-training. We introduce the Stable Neighbor Denoising (SND) approach, which effectively discovers highly correlated stable and unstable samples. SND consistently outperforms state-of-the-art methods in various SFUDA semantic segmentation settings.
arXiv Detail & Related papers (2024-06-10T21:44:52Z)
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges. We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z)
Harnessing Contrastive Learning and Neural Transformation for Time Series Anomaly Detection [0.0]
Time series anomaly detection (TSAD) plays a vital role in many industrial applications.<n>Contrastive learning has gained momentum in the time series domain for its prowess in extracting meaningful representations from unlabeled data.<n>In this study, we propose a novel approach, CNT, that incorporates a window-based contrastive learning strategy fortified with learnable transformations.
arXiv Detail & Related papers (2023-04-16T21:36:19Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Sufficient Invariant Learning for Distribution Shift [20.88069274935592]
We introduce a novel learning principle called the Sufficient Invariant Learning (SIL) framework. SIL focuses on learning a sufficient subset of invariant features rather than relying on a single feature. We propose a new algorithm, Adaptive Sharpness-aware Group Distributionally Robust Optimization (ASGDRO), to learn diverse invariant features by seeking common flat minima.
arXiv Detail & Related papers (2022-10-24T18:34:24Z)
Intrinsic Anomaly Detection for Multi-Variate Time Series [33.199682596741276]
Intrinsic anomalies are changes in the functional dependency structure between time series that represent an environment and time series that represent the internal state of a system that is placed in said environment. These address the short-comings of existing anomaly detection methods that cannot differentiate between expected changes in the system's state and unexpected ones, i.e., changes in the system that deviate from the environment's influence. Our most promising approach is fully unsupervised and combines adversarial learning and time series representation learning, thereby addressing problems such as label sparsity and subjectivity.
arXiv Detail & Related papers (2022-06-29T00:51:44Z)
Attribute-Guided Adversarial Training for Robustness to Natural Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z)
Learning perturbation sets for robust machine learning [97.6757418136662]
We use a conditional generator that defines the perturbation set over a constrained region of the latent space. We measure the quality of our learned perturbation sets both quantitatively and qualitatively. We leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations.
arXiv Detail & Related papers (2020-07-16T16:39:54Z)
Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts. We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.