Leveraging Diffusion-Based Image Variations for Robust Training on
Poisoned Data
- URL: http://arxiv.org/abs/2310.06372v2
- Date: Wed, 13 Dec 2023 19:58:51 GMT
- Title: Leveraging Diffusion-Based Image Variations for Robust Training on
Poisoned Data
- Authors: Lukas Struppek, Martin B. Hentschel, Clifton Poth, Dominik
Hintersdorf, Kristian Kersting
- Abstract summary: Backdoor attacks pose a serious security threat for training neural networks.
We propose a novel approach that enables model training on potentially poisoned datasets by utilizing the power of recent diffusion models.
- Score: 26.551317580666353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attacks pose a serious security threat for training neural networks
as they surreptitiously introduce hidden functionalities into a model. Such
backdoors remain silent during inference on clean inputs, evading detection due
to inconspicuous behavior. However, once a specific trigger pattern appears in
the input data, the backdoor activates, causing the model to execute its
concealed function. Detecting such poisoned samples within vast datasets is
virtually impossible through manual inspection. To address this challenge, we
propose a novel approach that enables model training on potentially poisoned
datasets by utilizing the power of recent diffusion models. Specifically, we
create synthetic variations of all training samples, leveraging the inherent
resilience of diffusion models to potential trigger patterns in the data. By
combining this generative approach with knowledge distillation, we produce
student models that maintain their general performance on the task while
exhibiting robust resistance to backdoor triggers.
Related papers
- Backdoor Defense through Self-Supervised and Generative Learning [0.0]
Training on such data injects a backdoor which causes malicious inference in selected test samples.
This paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space.
In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset.
arXiv Detail & Related papers (2024-09-02T11:40:01Z) - DLP: towards active defense against backdoor attacks with decoupled learning process [2.686336957004475]
We propose a general training pipeline to defend against backdoor attacks.
We show that the model shows different learning behaviors in clean and poisoned subsets during training.
The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.
arXiv Detail & Related papers (2024-06-18T23:04:38Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion Models are vulnerable to backdoor attacks.
malicious attackers inject backdoors by poisoning some parts of the training samples.
This poses a serious threat to the downstream users, who query the diffusion models through the API or directly download them from the internet.
arXiv Detail & Related papers (2024-04-01T13:21:05Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Transpose Attack: Stealing Datasets with Bidirectional Training [4.166238443183223]
We show that adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models.
We propose a novel approach for detecting infected models.
arXiv Detail & Related papers (2023-11-13T15:14:50Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - How to Backdoor Diffusion Models? [74.43215520371506]
This paper presents the first study on the robustness of diffusion models against backdoor attacks.
We propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation.
Our results call attention to potential risks and possible misuse of diffusion models.
arXiv Detail & Related papers (2022-12-11T03:44:38Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - TOP: Backdoor Detection in Neural Networks via Transferability of
Perturbation [1.52292571922932]
Detection of backdoors in trained models without access to the training data or example triggers is an important open problem.
In this paper, we identify an interesting property of these models: adversarial perturbations transfer from image to image more readily in poisoned models than in clean models.
We use this feature to detect poisoned models in the TrojAI benchmark, as well as additional models.
arXiv Detail & Related papers (2021-03-18T14:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.