Related papers: Provably robust deep generative models

Provably robust deep generative models

URL: http://arxiv.org/abs/2004.10608v1
Date: Wed, 22 Apr 2020 14:47:41 GMT
Title: Provably robust deep generative models
Authors: Filipe Condessa, Zico Kolter
Abstract summary: We propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE) We show that it is able to produce generative models that are substantially more robust to adversarial attacks.
Score: 1.52292571922932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work in adversarial attacks has developed provably robust methods for training deep neural network classifiers. However, although they are often mentioned in the context of robustness, deep generative models themselves have received relatively little attention in terms of formally analyzing their robustness properties. In this paper, we propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE). To do so, we first formally define a (certifiably) robust lower bound on the variational lower bound of the likelihood, and then show how this bound can be optimized during training to produce a robust VAE. We evaluate the method on simple examples, and show that it is able to produce generative models that are substantially more robust to adversarial attacks (i.e., an adversary trying to perturb inputs so as to drastically lower their likelihood under the model).

Related papers

Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness. We show that this observed gain in robustness is an illusion of robustness (IOR) We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z)
Generating Less Certain Adversarial Examples Improves Robust Generalization [22.00283527210342]
This paper revisits the robust overfitting phenomenon of adversarial training. We argue that overconfidence in predicting adversarial examples is a potential cause. We propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples.
arXiv Detail & Related papers (2023-10-06T19:06:13Z)
A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement [18.532308729844598]
We propose a novel prompt-based adversarial attack to compromise NLP models. We generate adversarial examples via mask-and-filling under the effect of a malicious purpose. Our training method does not actually generate adversarial samples, it can be applied to large-scale training sets efficiently.
arXiv Detail & Related papers (2022-03-21T03:21:32Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples. We propose a new framework called SPROUT, self-progressing robust training. Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness. By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z)
Affine-Invariant Robust Training [0.0]
This project reviews work in spatial robustness methods and proposes zeroth order optimization algorithms to find the worst affine transforms for each input. The proposed method effectively yields robust models and allows introducing non-parametric adversarial perturbations.
arXiv Detail & Related papers (2020-10-08T18:59:19Z)
DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles [20.46399318111058]
Adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks.
arXiv Detail & Related papers (2020-09-30T14:57:35Z)
Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods. Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.