Related papers: RAB: Provable Robustness Against Backdoor Attacks

RAB: Provable Robustness Against Backdoor Attacks

URL: http://arxiv.org/abs/2003.08904v8
Date: Thu, 3 Aug 2023 14:59:19 GMT
Title: RAB: Provable Robustness Against Backdoor Attacks
Authors: Maurice Weber, Xiaojun Xu, Bojan Karla\v{s}, Ce Zhang, Bo Li
Abstract summary: We focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We conduct comprehensive experiments for different machine learning (ML) models and provide the first benchmark for certified robustness against backdoor attacks.
Score: 20.702977915926787
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.

Related papers

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Robustness of Selected Learning Models under Label-Flipping Attack [1.3812010983144798]
We compare traditional machine learning and deep learning models trained on a malware dataset when subjected to adversarial attack based on label-flipping. We find wide variation in the robustness of the models tested to adversarial attack, with our model achieving the best combination of initial accuracy and robustness.
arXiv Detail & Related papers (2025-01-21T22:00:54Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
Robust Models are less Over-Confident [10.42820615166362]
adversarial training (AT) aims to achieve robustness against such attacks. We empirically analyze a variety of adversarially trained models that achieve high robust accuracies. AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions.
arXiv Detail & Related papers (2022-10-12T06:14:55Z)
RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN [28.94653593443991]
Recently backdoor attack has become an emerging threat to the security of deep neural network (DNN) models. In this paper, we propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC)
arXiv Detail & Related papers (2022-08-22T21:27:09Z)
Careful What You Wish For: on the Extraction of Adversarially Trained Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats. We propose a framework to assess extraction attacks on adversarially trained models. We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
CRFL: Certifiably Robust Federated Learning against Backdoor Attacks [59.61565692464579]
This paper provides the first general framework, Certifiably Robust Federated Learning (CRFL), to train certifiably robust FL models against backdoors. Our method exploits clipping and smoothing on model parameters to control the global model smoothness, which yields a sample-wise robustness certification on backdoors with limited magnitude.
arXiv Detail & Related papers (2021-06-15T16:50:54Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Omni: Automated Ensemble with Unexpected Models against Adversarial Evasion Attack [35.0689225703137]
A machine learning-based security detection model is susceptible to adversarial evasion attacks. We propose an approach called Omni to explore methods that create an ensemble of "unexpected models" In studies with five types of adversarial evasion attacks, we show Omni is a promising approach as a defense strategy.
arXiv Detail & Related papers (2020-11-23T20:02:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.