TESSERACT: Eliminating Experimental Bias in Malware Classification
across Space and Time (Extended Version)
- URL: http://arxiv.org/abs/2402.01359v1
- Date: Fri, 2 Feb 2024 12:27:32 GMT
- Title: TESSERACT: Eliminating Experimental Bias in Malware Classification
across Space and Time (Extended Version)
- Authors: Zeliang Kan, Shae McFadden, Daniel Arp, Feargus Pendlebury, Roberto
Jordaney, Johannes Kinder, Fabio Pierazzi, Lorenzo Cavallaro
- Abstract summary: Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods.
This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task.
- Score: 18.146377453918724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) plays a pivotal role in detecting malicious software.
Despite the high F1-scores reported in numerous studies reaching upwards of
0.99, the issue is not completely solved. Malware detectors often experience
performance decay due to constantly evolving operating systems and attack
methods, which can render previously learned knowledge insufficient for
accurate decision-making on new inputs. This paper argues that commonly
reported results are inflated due to two pervasive sources of experimental bias
in the detection task: spatial bias caused by data distributions that are not
representative of a real-world deployment; and temporal bias caused by
incorrect time splits of data, leading to unrealistic configurations. To
address these biases, we introduce a set of constraints for fair experiment
design, and propose a new metric, AUT, for classifier robustness in real-world
settings. We additionally propose an algorithm designed to tune training data
to enhance classifier performance. Finally, we present TESSERACT, an
open-source framework for realistic classifier comparison. Our evaluation
encompasses both traditional ML and deep learning methods, examining published
works on an extensive Android dataset with 259,230 samples over a five-year
span. Additionally, we conduct case studies in the Windows PE and PDF domains.
Our findings identify the existence of biases in previous studies and reveal
that significant performance enhancements are possible through appropriate,
periodic tuning. We explore how mitigation strategies may support in achieving
a more stable and better performance over time by employing multiple strategies
to delay performance decay.
Related papers
- Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection [0.13980986259786224]
In real-world scenarios, such as fraud detection or credit scoring, labels may be delayed.
batch incremental algorithms are widely used in many real-world tasks.
Our findings indicate that instance incremental learning is not the superior option.
arXiv Detail & Related papers (2024-09-16T09:20:01Z) - Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training.
Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time [69.77704012415845]
Temporal shifts can considerably degrade performance of machine learning models deployed in the real world.
We benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning.
Under both evaluation strategies, we observe an average performance drop of 20% from in-distribution to out-of-distribution data.
arXiv Detail & Related papers (2022-11-25T17:07:53Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.