Defending Variational Autoencoders from Adversarial Attacks with MCMC
- URL: http://arxiv.org/abs/2203.09940v1
- Date: Fri, 18 Mar 2022 13:25:18 GMT
- Title: Defending Variational Autoencoders from Adversarial Attacks with MCMC
- Authors: Anna Kuzina, Max Welling, Jakub M. Tomczak
- Abstract summary: Variational autoencoders (VAEs) are deep generative models used in various domains.
As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input.
Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
- Score: 74.36233246536459
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational autoencoders (VAEs) are deep generative models used in various
domains. VAEs can generate complex objects and provide meaningful latent
representations, which can be further used in downstream tasks such as
classification. As previous work has shown, one can easily fool VAEs to produce
unexpected latent representations and reconstructions for a visually slightly
modified input. Here, we examine several objective functions for adversarial
attacks construction, suggest metrics assess the model robustness, and propose
a solution to alleviate the effect of an attack. Our method utilizes the Markov
Chain Monte Carlo (MCMC) technique in the inference step and is motivated by
our theoretical analysis. Thus, we do not incorporate any additional costs
during training or we do not decrease the performance on non-attacked inputs.
We validate our approach on a variety of datasets (MNIST, Fashion MNIST, Color
MNIST, CelebA) and VAE configurations ($\beta$-VAE, NVAE, TC-VAE) and show that
it consistently improves the model robustness to adversarial attacks.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback?
We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS)
We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - Diagnosing Vulnerability of Variational Auto-Encoders to Adversarial
Attacks [80.73580820014242]
We show how to modify data point to obtain a prescribed latent code (supervised attack) or just get a drastically different code (unsupervised attack)
We examine the influence of model modifications on the robustness of VAEs and suggest metrics to quantify it.
arXiv Detail & Related papers (2021-03-10T14:23:20Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Luring of transferable adversarial perturbations in the black-box
paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks.
A removable additional neural network is included in the target model, and is designed to induce the textitluring effect.
Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.