The Devil is in the GAN: Defending Deep Generative Models Against
Backdoor Attacks
- URL: http://arxiv.org/abs/2108.01644v1
- Date: Tue, 3 Aug 2021 17:33:38 GMT
- Title: The Devil is in the GAN: Defending Deep Generative Models Against
Backdoor Attacks
- Authors: Ambrish Rawat, Killian Levacher, Mathieu Sinn
- Abstract summary: We describe novel training-time attacks resulting in corrupted Deep Generative Models (DGMs)
Our attacks are based on adversarial loss functions that combine the dual objectives of attack stealth and fidelity.
Our experiments show that - even for large-scale industry-grade DGMs - our attack can be mounted with only modest computational efforts.
- Score: 4.369506407912208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Generative Models (DGMs) allow users to synthesize data from complex,
high-dimensional manifolds. Industry applications of DGMs include data
augmentation to boost performance of (semi-)supervised machine learning, or to
mitigate fairness or privacy concerns. Large-scale DGMs are notoriously hard to
train, requiring expert skills, large amounts of data and extensive
computational resources. Thus, it can be expected that many enterprises will
resort to sourcing pre-trained DGMs from potentially unverified third parties,
e.g.~open source model repositories.
As we show in this paper, such a deployment scenario poses a new attack
surface, which allows adversaries to potentially undermine the integrity of
entire machine learning development pipelines in a victim organization.
Specifically, we describe novel training-time attacks resulting in corrupted
DGMs that synthesize regular data under normal operations and designated target
outputs for inputs sampled from a trigger distribution. Depending on the
control that the adversary has over the random number generation, this imposes
various degrees of risk that harmful data may enter the machine learning
development pipelines, potentially causing material or reputational damage to
the victim organization.
Our attacks are based on adversarial loss functions that combine the dual
objectives of attack stealth and fidelity. We show its effectiveness for a
variety of DGM architectures (Generative Adversarial Networks (GANs),
Variational Autoencoders (VAEs)) and data domains (images, audio). Our
experiments show that - even for large-scale industry-grade DGMs - our attack
can be mounted with only modest computational efforts. We also investigate the
effectiveness of different defensive approaches (based on static/dynamic model
and output inspections) and prescribe a practical defense strategy that paves
the way for safe usage of DGMs.
Related papers
- Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - How Realistic Is Your Synthetic Data? Constraining Deep Generative
Models for Tabular Data [57.97035325253996]
We show how Constrained Deep Generative Models (C-DGMs) can be transformed into realistic synthetic data models.
C-DGMs are able to exploit the background knowledge expressed by the constraints to outperform their standard counterparts.
arXiv Detail & Related papers (2024-02-07T13:22:05Z) - Seeing Is Not Always Believing: Invisible Collision Attack and Defence on Pre-Trained Models [7.7318705389136655]
Existing backdoor attacks or data poisoning methods often build up the assumption that the attacker invades the computers of victims or accesses the target data.
In this paper, we propose a novel framework for an invisible attack on PTMs with enhanced MD5 collision.
We extensively validate the effectiveness and stealthiness of our proposed attack and defensive method on different models and data sets.
arXiv Detail & Related papers (2023-09-24T08:34:35Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - DODEM: DOuble DEfense Mechanism Against Adversarial Attacks Towards
Secure Industrial Internet of Things Analytics [8.697883716452385]
We propose a double defense mechanism to detect and mitigate adversarial attacks in I-IoT environments.
We first detect if there is an adversarial attack on a given sample using novelty detection algorithms.
If there is an attack, adversarial retraining provides a more robust model, while we apply standard training for regular samples.
arXiv Detail & Related papers (2023-01-23T22:10:40Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Fabricated Flips: Poisoning Federated Learning without Data [9.060263645085564]
Attacks on Federated Learning (FL) can severely reduce the quality of the generated models.
We propose a data-free untargeted attack (DFA) that synthesizes malicious data to craft adversarial models.
DFA achieves similar or even higher attack success rate than state-of-the-art untargeted attacks.
arXiv Detail & Related papers (2022-02-07T20:38:28Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - An Introduction to Deep Generative Modeling [8.909115457491522]
Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions.
We provide an introduction to DGMs and a framework for modeling the three most popular approaches.
Our goal is to enable and motivate the reader to contribute to this proliferating research area.
arXiv Detail & Related papers (2021-03-09T02:19:06Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.