Related papers: ML-Based Behavioral Malware Detection Is Far From a Solved Problem

ML-Based Behavioral Malware Detection Is Far From a Solved Problem

URL: http://arxiv.org/abs/2405.06124v2
Date: Thu, 06 Mar 2025 20:40:57 GMT
Title: ML-Based Behavioral Malware Detection Is Far From a Solved Problem
Authors: Yigitcan Kaya, Yizheng Chen, Marcus Botacin, Shoumik Saha, Fabio Pierazzi, Lorenzo Cavallaro, David Wagner, Tudor Dumitras,
Abstract summary: Malware detection is a ubiquitous application of Machine Learning (ML) in security.<n>In deployment, a malware detector at endpoint hosts often must rely on traces captured from endpoint hosts, not from a sandbox.<n>We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints.
Score: 24.699642272580764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Malware detection is a ubiquitous application of Machine Learning (ML) in security. In behavioral malware analysis, the detector relies on features extracted from program execution traces. The research literature has focused on detectors trained with features collected from sandbox environments and evaluated on samples also analyzed in a sandbox. However, in deployment, a malware detector at endpoint hosts often must rely on traces captured from endpoint hosts, not from a sandbox. Thus, there is a gap between the literature and real-world needs. We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces, we evaluate two scenarios: (i) an endpoint detector trained on sandbox traces (convenient and easy to train), and (ii) an endpoint detector trained on endpoint traces (more challenging to train, since we need to collect telemetry data). We discover a wide gap between the performance as measured using prior evaluation methods in the literature -- over 90% -- vs. expected performance in endpoint detection -- about 20% (scenario (i)) to 50% (scenario (ii)). We characterize the ML challenges that arise in this domain and contribute to this gap, including label noise, distribution shift, and spurious features. Moreover, we show several techniques that achieve 5--30% relative performance improvements over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection is challenging. The most promising direction is training detectors directly on endpoint data, which marks a departure from current practice. To promote progress, we will facilitate researchers to perform realistic detector evaluations against our real-world dataset.

Related papers

PARIS: A Practical, Adaptive Trace-Fetching and Real-Time Malicious Behavior Detection System [6.068607290592521]
We propose adaptive trace fetching, lightweight, real-time malicious behavior detection system. Specifically, we monitor malicious behavior with Event Tracing for Windows (ETW) and learn to selectively collect maliciousness-related APIs or call stacks. As a result, we can monitor a wider range of APIs and detect more intricate attack behavior.
arXiv Detail & Related papers (2024-11-02T14:52:04Z)
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios [38.952481877244644]
We present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. Using popular large language models (LLMs), we generated data that better aligns with real-world applications. We analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors.
arXiv Detail & Related papers (2024-10-31T09:01:25Z)
DF40: Toward Next-Generation Deepfake Detection [62.073997142001424]
existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset and testing them on other prevalent deepfake datasets. But can these stand-out "winners" be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world? We construct a highly diverse deepfake detection dataset called DF40, which comprises 40 distinct deepfake techniques.
arXiv Detail & Related papers (2024-06-19T12:35:02Z)
Probing Language Models for Pre-training Data Detection [11.37731401086372]
We propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection.
arXiv Detail & Related papers (2024-06-03T13:58:04Z)
UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking [8.645078288584305]
Multi-object tracking (MOT) methods have seen a significant boost in performance recently. We introduce UncertaintyTrack, a collection of extensions that can be applied to multiple TBD trackers. Experiments on the Berkeley Deep Drive MOT dataset show that the combination of our method and informative uncertainty estimates reduces the number of ID switches by around 19%.
arXiv Detail & Related papers (2024-02-19T17:27:04Z)
Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways. We uncover a significant correlation between topics and detection performance. These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z)
DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions. DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z)
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration [17.461451218469062]
In this work, we introduce the Self-Aware Object Detection (SAOD) task. The SAOD task respects and adheres to the challenges that object detectors face in safety-critical environments such as autonomous driving. We extensively use our framework, which introduces novel metrics and large scale test datasets, to test numerous object detectors.
arXiv Detail & Related papers (2023-07-03T11:16:39Z)
Label-Efficient Object Detection via Region Proposal Network Pre-Training [58.50615557874024]
We propose a simple pretext task that provides an effective pre-training for the region proposal network (RPN) In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance.
arXiv Detail & Related papers (2022-11-16T16:28:18Z)
A Bayesian Detect to Track System for Robust Visual Object Tracking and Semi-Supervised Model Learning [1.7268829007643391]
We ad-dress problems in a Bayesian tracking and detection framework parameterized by neural network outputs. We propose a particle filter-based approximate sampling algorithm for tracking object state estimation. Based on our particle filter inference algorithm, a semi-supervised learn-ing algorithm is utilized for learning tracking network on intermittent labeled frames.
arXiv Detail & Related papers (2022-05-05T00:18:57Z)
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection [4.56877715768796]
We propose the adoption of meta-learning, in the form of a two-layer Stacker, to create a mixed approach that detects both known and unknown threats. It turns out to be more effective in detecting zero-day attacks than supervised algorithms, limiting their main weakness but still maintaining adequate capabilities in detecting known attacks.
arXiv Detail & Related papers (2022-02-28T08:41:32Z)
Triggering Failures: Out-Of-Distribution detection by learning from local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation. Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA) We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z)
SSD: A Unified Framework for Self-Supervised Outlier Detection [37.254114112911786]
We propose an outlier detector based on only unlabeled in-distribution data. We use self-supervised representation learning followed by a Mahalanobis distance based detection. We extend our framework to incorporate training data labels, if available.
arXiv Detail & Related papers (2021-03-22T17:51:35Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
Detection as Regression: Certified Object Detection by Median Smoothing [50.89591634725045]
This work is motivated by recent progress on certified classification by randomized smoothing. We obtain the first model-agnostic, training-free, and certified defense for object detection against $ell$-bounded attacks.
arXiv Detail & Related papers (2020-07-07T18:40:19Z)
Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors. We argue that sample weighting should be data-dependent and task-dependent. We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z)
Robust Spammer Detection by Nash Reinforcement Learning [64.80986064630025]
We develop a minimax game where the spammers and spam detectors compete with each other on their practical goals. We show that an optimization algorithm can reliably find an equilibrial detector that can robustly prevent spammers with any mixed spamming strategies from attaining their practical goal.
arXiv Detail & Related papers (2020-06-10T21:18:07Z)
RetinaTrack: Online Single Stage Joint Detection and Tracking [22.351109024452462]
We focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach.
arXiv Detail & Related papers (2020-03-30T23:46:29Z)
Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim. We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting. Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.