TESSERACT: Eliminating Experimental Bias in Malware Classification
across Space and Time (Extended Version)
- URL: http://arxiv.org/abs/2402.01359v1
- Date: Fri, 2 Feb 2024 12:27:32 GMT
- Title: TESSERACT: Eliminating Experimental Bias in Malware Classification
across Space and Time (Extended Version)
- Authors: Zeliang Kan, Shae McFadden, Daniel Arp, Feargus Pendlebury, Roberto
Jordaney, Johannes Kinder, Fabio Pierazzi, Lorenzo Cavallaro
- Abstract summary: Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods.
This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task.
- Score: 18.146377453918724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) plays a pivotal role in detecting malicious software.
Despite the high F1-scores reported in numerous studies reaching upwards of
0.99, the issue is not completely solved. Malware detectors often experience
performance decay due to constantly evolving operating systems and attack
methods, which can render previously learned knowledge insufficient for
accurate decision-making on new inputs. This paper argues that commonly
reported results are inflated due to two pervasive sources of experimental bias
in the detection task: spatial bias caused by data distributions that are not
representative of a real-world deployment; and temporal bias caused by
incorrect time splits of data, leading to unrealistic configurations. To
address these biases, we introduce a set of constraints for fair experiment
design, and propose a new metric, AUT, for classifier robustness in real-world
settings. We additionally propose an algorithm designed to tune training data
to enhance classifier performance. Finally, we present TESSERACT, an
open-source framework for realistic classifier comparison. Our evaluation
encompasses both traditional ML and deep learning methods, examining published
works on an extensive Android dataset with 259,230 samples over a five-year
span. Additionally, we conduct case studies in the Windows PE and PDF domains.
Our findings identify the existence of biases in previous studies and reveal
that significant performance enhancements are possible through appropriate,
periodic tuning. We explore how mitigation strategies may support in achieving
a more stable and better performance over time by employing multiple strategies
to delay performance decay.
Related papers
- ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition [52.537021302246664]
Action recognition models often suffer from background bias (i.e., inferring actions based on background cues) and foreground bias (i.e., relying on subject appearance)
We propose ALBAR, a novel adversarial training method that mitigates foreground and background biases without requiring specialized knowledge of the bias attributes.
We evaluate our method on established background and foreground bias protocols, setting a new state-of-the-art and strongly improving combined debiasing performance by over 12% on HMDB51.
arXiv Detail & Related papers (2025-01-31T20:47:06Z) - Breaking Fine-Grained Classification Barriers with Cost-Free Data in Few-Shot Class-Incremental Learning [13.805180905579832]
We propose a novel learning paradigm to break barriers in fine-grained classification.
It enables the model to learn beyond the standard training phase and benefit from cost-free data encountered during system operation.
arXiv Detail & Related papers (2024-12-29T07:11:44Z) - Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training.
Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.