Related papers: Improving the Certified Robustness of Neural Networks via Consistency Regularization

Improving the Certified Robustness of Neural Networks via Consistency Regularization

URL: http://arxiv.org/abs/2012.13103v2
Date: Wed, 20 Jan 2021 03:08:33 GMT
Title: Improving the Certified Robustness of Neural Networks via Consistency Regularization
Authors: Mengting Xu, Tao Zhang, Zhongnian Li, Daoqiang Zhang
Abstract summary: A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples. Most of these provable defense methods treat all examples equally during training process. In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples.
Score: 25.42238710803711
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples, among which provable defense methods have been demonstrated to be effective to train neural networks that are certifiably robust to the attacker. However, most of these provable defense methods treat all examples equally during training process, which ignore the inconsistent constraint of certified robustness between correctly classified (natural) and misclassified examples. In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples. Specifically, we identified that the certified robustness of network can be significantly improved if the constraint of certified robustness on misclassified examples and correctly classified examples is consistent. Motivated by this discovery, we design a new defense regularization term called Misclassification Aware Adversarial Regularization (MAAR), which constrains the output probability distributions of all examples in the certified region of the misclassified example. Experimental results show that our proposed MAAR achieves the best certified robustness and comparable accuracy on CIFAR-10 and MNIST datasets in comparison with several state-of-the-art methods.

Related papers

On Using Certified Training towards Empirical Robustness [40.582830117229854]
We show that a certified training algorithm can prevent catastrophic overfitting on single-step attacks. We also present a novel regularizer for network over-approximations that can achieve similar effects while markedly reducing runtime.
arXiv Detail & Related papers (2024-10-02T14:56:21Z)
Improving Adversarial Training using Vulnerability-Aware Perturbation Budget [7.430861908931903]
Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks. We propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT. Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.
arXiv Detail & Related papers (2024-03-06T21:50:52Z)
Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples. We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC) LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes [11.584771636861877]
We show that a provable framework can benefit by extension to networks with multiple explicit abstain classes. We propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes.
arXiv Detail & Related papers (2022-10-26T01:23:33Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate. By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z)
Regularized Training and Tight Certification for Randomized Smoothed Classifier with Provable Robustness [15.38718018477333]
We derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart. We also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability.
arXiv Detail & Related papers (2020-02-17T20:54:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.