Robust Upper Bounds for Adversarial Training
- URL: http://arxiv.org/abs/2112.09279v2
- Date: Thu, 6 Apr 2023 01:50:01 GMT
- Title: Robust Upper Bounds for Adversarial Training
- Authors: Dimitris Bertsimas, Xavier Boix, Kimberly Villalobos Carballo, Dick
den Hertog
- Abstract summary: We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss.
This bound is based on a holistic expansion of the network instead of separate bounds for each layer.
We derive two new methods with the proposed approach.
- Score: 4.971729553254843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many state-of-the-art adversarial training methods for deep learning leverage
upper bounds of the adversarial loss to provide security guarantees against
adversarial attacks. Yet, these methods rely on convex relaxations to propagate
lower and upper bounds for intermediate layers, which affect the tightness of
the bound at the output layer. We introduce a new approach to adversarial
training by minimizing an upper bound of the adversarial loss that is based on
a holistic expansion of the network instead of separate bounds for each layer.
This bound is facilitated by state-of-the-art tools from Robust Optimization;
it has closed-form and can be effectively trained using backpropagation. We
derive two new methods with the proposed approach. The first method
(Approximated Robust Upper Bound or aRUB) uses the first order approximation of
the network as well as basic tools from Linear Robust Optimization to obtain an
empirical upper bound of the adversarial loss that can be easily implemented.
The second method (Robust Upper Bound or RUB), computes a provable upper bound
of the adversarial loss. Across a variety of tabular and vision data sets we
demonstrate the effectiveness of our approach -- RUB is substantially more
robust than state-of-the-art methods for larger perturbations, while aRUB
matches the performance of state-of-the-art methods for small perturbations.
Related papers
- Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Reliably fast adversarial training via latent adversarial perturbation [5.444459446244819]
A single-step latent adversarial training method is proposed to mitigate the above-mentioned overhead cost.
Despite its structural simplicity, the proposed method outperforms state-of-the-art accelerated adversarial training methods.
arXiv Detail & Related papers (2021-04-04T09:47:38Z) - Edge-Preserving Guided Semantic Segmentation for VIPriors Challenge [3.435043566706133]
Current state-of-the-art and deep learning-based semantic segmentation techniques are hard to train well.
We propose edge-preserving guidance to obtain the extra prior information.
Experiments demonstrate that the proposed method can achieve excellent performance under small-scale training set.
arXiv Detail & Related papers (2020-07-17T11:49:10Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.