Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
- URL: http://arxiv.org/abs/2510.01676v1
- Date: Thu, 02 Oct 2025 05:04:44 GMT
- Title: Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
- Authors: Milad Nasr, Yanick Fratantonio, Luca Invernizzi, Ange Albertini, Loua Farah, Alex Petit-Bianco, Andreas Terzis, Kurt Thomas, Elie Bursztein, Nicholas Carlini,
- Abstract summary: This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system.<n>By changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases.<n>For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate.
- Score: 43.26879314353337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level vulnerabilities with real-world impact. This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system, performing a case study analysis of Gmail's pipeline where file-type identification relies on a ML model. The malware detection pipeline in use by Gmail contains a machine learning model that routes each potential malware sample to a specialized malware classifier to improve accuracy and performance. This model, called Magika, has been open sourced. By designing adversarial examples that fool Magika, we can cause the production malware service to incorrectly route malware to an unsuitable malware detector thereby increasing our chance of evading detection. Specifically, by changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases and thereby allow us to send malware files over Gmail. We then turn our attention to defenses, and develop an approach to mitigate the severity of these types of attacks. For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate. We implement this defense, and, thanks to a collaboration with Google engineers, it has already been deployed in production for the Gmail classifier.
Related papers
- Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection [11.53708766953391]
We introduce label spoofing attacks, a new threat that contaminates crowd-sourced datasets by embedding minimal and undetectable malicious patterns.<n>We demonstrate this scenario by developing AndroVenom, a methodology for polluting realistic data sources.<n> Experiments show that not only state-of-the-art feature extractors are unable to filter such injection, but also various ML models experience Denial of Service.
arXiv Detail & Related papers (2025-03-14T20:05:56Z) - MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Single-Shot Black-Box Adversarial Attacks Against Malware Detectors: A
Causal Language Model Approach [5.2424255020469595]
Adversarial Malware example Generation aims to generate evasive malware variants.
Black-box method has gained more attention than white-box methods.
In this study, we show that a novel DL-based causal language model enables single-shot evasion.
arXiv Detail & Related papers (2021-12-03T05:29:50Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery [23.294653273180472]
We show how a malicious actor trains a surrogate model to discover binary mutations that cause an instance to be misclassified.
Then, mutated malware is sent to the victim model that takes the place of an antivirus API to test whether it can evade detection.
arXiv Detail & Related papers (2021-06-15T03:31:02Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Explanation-Guided Backdoor Poisoning Attacks Against Malware
Classifiers [12.78844634194129]
Training pipelines for machine learning based malware classification often rely on crowdsourced threat feeds.
This paper focuses on challenging "clean label" attacks where attackers do not control the sample labeling process.
We propose the use of techniques from explainable machine learning to guide the selection of relevant features and values to create effective backdoor triggers.
arXiv Detail & Related papers (2020-03-02T17:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.