How to Train your Antivirus: RL-based Hardening through the Problem-Space
- URL: http://arxiv.org/abs/2402.19027v2
- Date: Thu, 5 Sep 2024 17:07:23 GMT
- Title: How to Train your Antivirus: RL-based Hardening through the Problem-Space
- Authors: Ilias Tsingenopoulos, Jacopo Cortellazzi, Branislav Bošanský, Simone Aonzo, Davy Preuveneers, Wouter Joosen, Fabio Pierazzi, Lorenzo Cavallaro,
- Abstract summary: Adversarial training, the sole defensive technique that can confer empirical robustness, is not applicable out of the box in this domain.
We introduce a novel Reinforcement Learning approach for constructing adversarial examples, a constituent part of adversarially training a model against evasion.
- Score: 22.056941223966255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ML-based malware detection on dynamic analysis reports is vulnerable to both evasion and spurious correlations. In this work, we investigate a specific ML architecture employed in the pipeline of a widely-known commercial antivirus company, with the goal to harden it against adversarial malware. Adversarial training, the sole defensive technique that can confer empirical robustness, is not applicable out of the box in this domain, for the principal reason that gradient-based perturbations rarely map back to feasible problem-space programs. We introduce a novel Reinforcement Learning approach for constructing adversarial examples, a constituent part of adversarially training a model against evasion. Our approach comes with multiple advantages. It performs modifications that are feasible in the problem-space, and only those; thus it circumvents the inverse mapping problem. It also makes possible to provide theoretical guarantees on the robustness of the model against a particular set of adversarial capabilities. Our empirical exploration validates our theoretical insights, where we can consistently reach 0% Attack Success Rate after a few adversarial retraining iterations.
Related papers
- Improving Adversarial Robustness in Android Malware Detection by Reducing the Impact of Spurious Correlations [3.7937308360299116]
Machine learning (ML) has demonstrated significant advancements in Android malware detection (AMD)
However, the resilience of ML against realistic evasion attacks remains a major obstacle for AMD.
In this study, we propose a domain adaptation technique to improve the generalizability of AMD by aligning the distribution of malware samples and AEs.
arXiv Detail & Related papers (2024-08-27T17:01:12Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Mitigating Backdoor Poisoning Attacks through the Lens of Spurious
Correlation [43.75579468533781]
backdoors can be implanted through crafting training instances with a specific trigger and a target label.
This paper posits that backdoor poisoning attacks exhibit emphspurious correlation between simple text features and classification labels.
Our empirical study reveals that the malicious triggers are highly correlated to their target labels.
arXiv Detail & Related papers (2023-05-19T11:18:20Z) - FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated
Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients)
We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness.
Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - RAP: Robustness-Aware Perturbations for Defending against Backdoor
Attacks on NLP Models [29.71136191379715]
We propose an efficient online defense mechanism based on robustness-aware perturbations.
We construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples.
Our method achieves better defending performance and much lower computational costs than existing online defense methods.
arXiv Detail & Related papers (2021-10-15T03:09:26Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Targeted Forgetting and False Memory Formation in Continual Learners
through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting.
We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training.
We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.