Improving the Certified Robustness of Neural Networks via Consistency
Regularization
- URL: http://arxiv.org/abs/2012.13103v2
- Date: Wed, 20 Jan 2021 03:08:33 GMT
- Title: Improving the Certified Robustness of Neural Networks via Consistency
Regularization
- Authors: Mengting Xu, Tao Zhang, Zhongnian Li, Daoqiang Zhang
- Abstract summary: A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples.
Most of these provable defense methods treat all examples equally during training process.
In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples.
- Score: 25.42238710803711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A range of defense methods have been proposed to improve the robustness of
neural networks on adversarial examples, among which provable defense methods
have been demonstrated to be effective to train neural networks that are
certifiably robust to the attacker. However, most of these provable defense
methods treat all examples equally during training process, which ignore the
inconsistent constraint of certified robustness between correctly classified
(natural) and misclassified examples. In this paper, we explore this
inconsistency caused by misclassified examples and add a novel consistency
regularization term to make better use of the misclassified examples.
Specifically, we identified that the certified robustness of network can be
significantly improved if the constraint of certified robustness on
misclassified examples and correctly classified examples is consistent.
Motivated by this discovery, we design a new defense regularization term called
Misclassification Aware Adversarial Regularization (MAAR), which constrains the
output probability distributions of all examples in the certified region of the
misclassified example. Experimental results show that our proposed MAAR
achieves the best certified robustness and comparable accuracy on CIFAR-10 and
MNIST datasets in comparison with several state-of-the-art methods.
Related papers
- On Using Certified Training towards Empirical Robustness [40.582830117229854]
We show that a certified training algorithm can prevent catastrophic overfitting on single-step attacks.
We also present a novel regularizer for network over-approximations that can achieve similar effects while markedly reducing runtime.
arXiv Detail & Related papers (2024-10-02T14:56:21Z) - Improving Adversarial Training using Vulnerability-Aware Perturbation
Budget [7.430861908931903]
Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks.
We propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT.
Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.
arXiv Detail & Related papers (2024-03-06T21:50:52Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples.
We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC)
LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Improving Adversarial Robustness via Joint Classification and Multiple
Explicit Detection Classes [11.584771636861877]
We show that a provable framework can benefit by extension to networks with multiple explicit abstain classes.
We propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes.
arXiv Detail & Related papers (2022-10-26T01:23:33Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate.
By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z) - Regularized Training and Tight Certification for Randomized Smoothed
Classifier with Provable Robustness [15.38718018477333]
We derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart.
We also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability.
arXiv Detail & Related papers (2020-02-17T20:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.