RAB: Provable Robustness Against Backdoor Attacks
- URL: http://arxiv.org/abs/2003.08904v8
- Date: Thu, 3 Aug 2023 14:59:19 GMT
- Title: RAB: Provable Robustness Against Backdoor Attacks
- Authors: Maurice Weber, Xiaojun Xu, Bojan Karla\v{s}, Ce Zhang, Bo Li
- Abstract summary: We focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks.
We propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks.
We conduct comprehensive experiments for different machine learning (ML) models and provide the first benchmark for certified robustness against backdoor attacks.
- Score: 20.702977915926787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have shown that deep neural networks (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the
defense side, there have been intensive efforts on improving both empirical and
provable robustness against evasion attacks; however, the provable robustness
against backdoor attacks still remains largely unexplored. In this paper, we
focus on certifying the machine learning model robustness against general
threat models, especially backdoor attacks. We first provide a unified
framework via randomized smoothing techniques and show how it can be
instantiated to certify the robustness against both evasion and backdoor
attacks. We then propose the first robust training process, RAB, to smooth the
trained model and certify its robustness against backdoor attacks. We prove the
robustness bound for machine learning models trained with RAB and prove that
our robustness bound is tight. In addition, we theoretically show that it is
possible to train the robust smoothed models efficiently for simple models such
as K-nearest neighbor classifiers, and we propose an exact smooth-training
algorithm that eliminates the need to sample from a noise distribution for such
models. Empirically, we conduct comprehensive experiments for different machine
learning (ML) models such as DNNs, support vector machines, and K-NN models on
MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for
certified robustness against backdoor attacks. In addition, we evaluate K-NN
models on a spambase tabular dataset to demonstrate the advantages of the
proposed exact algorithm. Both the theoretic analysis and the comprehensive
evaluation on diverse ML models and datasets shed light on further robust
learning strategies against general training time attacks.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Robust Models are less Over-Confident [10.42820615166362]
adversarial training (AT) aims to achieve robustness against such attacks.
We empirically analyze a variety of adversarially trained models that achieve high robust accuracies.
AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions.
arXiv Detail & Related papers (2022-10-12T06:14:55Z) - RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact
DNN [28.94653593443991]
Recently backdoor attack has become an emerging threat to the security of deep neural network (DNN) models.
In this paper, we propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC)
arXiv Detail & Related papers (2022-08-22T21:27:09Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - CRFL: Certifiably Robust Federated Learning against Backdoor Attacks [59.61565692464579]
This paper provides the first general framework, Certifiably Robust Federated Learning (CRFL), to train certifiably robust FL models against backdoors.
Our method exploits clipping and smoothing on model parameters to control the global model smoothness, which yields a sample-wise robustness certification on backdoors with limited magnitude.
arXiv Detail & Related papers (2021-06-15T16:50:54Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Omni: Automated Ensemble with Unexpected Models against Adversarial
Evasion Attack [35.0689225703137]
A machine learning-based security detection model is susceptible to adversarial evasion attacks.
We propose an approach called Omni to explore methods that create an ensemble of "unexpected models"
In studies with five types of adversarial evasion attacks, we show Omni is a promising approach as a defense strategy.
arXiv Detail & Related papers (2020-11-23T20:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.