How to Train your Antivirus: RL-based Hardening through the
Problem-Space
- URL: http://arxiv.org/abs/2402.19027v1
- Date: Thu, 29 Feb 2024 10:38:56 GMT
- Title: How to Train your Antivirus: RL-based Hardening through the
Problem-Space
- Authors: Jacopo Cortellazzi and Ilias Tsingenopoulos and Branislav
Bo\v{s}ansk\'y and Simone Aonzo and Davy Preuveneers and Wouter Joosen and
Fabio Pierazzi and Lorenzo Cavallaro
- Abstract summary: Adversarial training, the sole defensive technique that can confer empirical robustness, is not applicable out of the box in this domain.
We introduce a novel Reinforcement Learning approach for constructing adversarial examples, a constituent part of adversarially training a model against evasion.
- Score: 23.00693822961603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ML-based malware detection on dynamic analysis reports is vulnerable to both
evasion and spurious correlations. In this work, we investigate a specific ML
architecture employed in the pipeline of a widely-known commercial antivirus
company, with the goal to harden it against adversarial malware. Adversarial
training, the sole defensive technique that can confer empirical robustness, is
not applicable out of the box in this domain, for the principal reason that
gradient-based perturbations rarely map back to feasible problem-space
programs. We introduce a novel Reinforcement Learning approach for constructing
adversarial examples, a constituent part of adversarially training a model
against evasion. Our approach comes with multiple advantages. It performs
modifications that are feasible in the problem-space, and only those; thus it
circumvents the inverse mapping problem. It also makes possible to provide
theoretical guarantees on the robustness of the model against a particular set
of adversarial capabilities. Our empirical exploration validates our
theoretical insights, where we can consistently reach 0\% Attack Success Rate
after a few adversarial retraining iterations.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Mitigating Backdoor Poisoning Attacks through the Lens of Spurious
Correlation [43.75579468533781]
backdoors can be implanted through crafting training instances with a specific trigger and a target label.
This paper posits that backdoor poisoning attacks exhibit emphspurious correlation between simple text features and classification labels.
Our empirical study reveals that the malicious triggers are highly correlated to their target labels.
arXiv Detail & Related papers (2023-05-19T11:18:20Z) - FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated
Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients)
We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness.
Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - RAP: Robustness-Aware Perturbations for Defending against Backdoor
Attacks on NLP Models [29.71136191379715]
We propose an efficient online defense mechanism based on robustness-aware perturbations.
We construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples.
Our method achieves better defending performance and much lower computational costs than existing online defense methods.
arXiv Detail & Related papers (2021-10-15T03:09:26Z) - On the Security Risks of AutoML [38.03918108363182]
Neural Architecture Search (NAS) is an emerging machine learning paradigm that automatically searches for models tailored to given tasks.
We show that compared with their manually designed counterparts, NAS-generated models tend to suffer greater vulnerability to various malicious attacks.
We discuss potential remedies to mitigate such drawbacks, including increasing cell depth and suppressing skip connects.
arXiv Detail & Related papers (2021-10-12T14:04:15Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Targeted Forgetting and False Memory Formation in Continual Learners
through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting.
We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training.
We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.