Provably robust deep generative models
- URL: http://arxiv.org/abs/2004.10608v1
- Date: Wed, 22 Apr 2020 14:47:41 GMT
- Title: Provably robust deep generative models
- Authors: Filipe Condessa, Zico Kolter
- Abstract summary: We propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE)
We show that it is able to produce generative models that are substantially more robust to adversarial attacks.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in adversarial attacks has developed provably robust methods for
training deep neural network classifiers. However, although they are often
mentioned in the context of robustness, deep generative models themselves have
received relatively little attention in terms of formally analyzing their
robustness properties. In this paper, we propose a method for training provably
robust generative models, specifically a provably robust version of the
variational auto-encoder (VAE). To do so, we first formally define a
(certifiably) robust lower bound on the variational lower bound of the
likelihood, and then show how this bound can be optimized during training to
produce a robust VAE. We evaluate the method on simple examples, and show that
it is able to produce generative models that are substantially more robust to
adversarial attacks (i.e., an adversary trying to perturb inputs so as to
drastically lower their likelihood under the model).
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - A Prompting-based Approach for Adversarial Example Generation and
Robustness Enhancement [18.532308729844598]
We propose a novel prompt-based adversarial attack to compromise NLP models.
We generate adversarial examples via mask-and-filling under the effect of a malicious purpose.
Our training method does not actually generate adversarial samples, it can be applied to large-scale training sets efficiently.
arXiv Detail & Related papers (2022-03-21T03:21:32Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness.
By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z) - Affine-Invariant Robust Training [0.0]
This project reviews work in spatial robustness methods and proposes zeroth order optimization algorithms to find the worst affine transforms for each input.
The proposed method effectively yields robust models and allows introducing non-parametric adversarial perturbations.
arXiv Detail & Related papers (2020-10-08T18:59:19Z) - DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of
Ensembles [20.46399318111058]
Adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset.
We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features.
The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks.
arXiv Detail & Related papers (2020-09-30T14:57:35Z) - Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods.
Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.