Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free
Backdoor Removal via Stabilized Model Inversion
- URL: http://arxiv.org/abs/2206.07018v3
- Date: Fri, 24 Mar 2023 01:32:49 GMT
- Title: Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free
Backdoor Removal via Stabilized Model Inversion
- Authors: Si Chen, Yi Zeng, Jiachen T.Wang, Won Park, Xun Chen, Lingjuan Lyu,
Zhuoqing Mao, Ruoxi Jia
- Abstract summary: We introduce a novel bi-level optimization-based framework for model inversion.
We find that reconstructed samples from a pre-trained generator's latent space are backdoor-free, even when utilizing signals from a backdoored model.
- Score: 27.294396320665594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many backdoor removal techniques in machine learning models require clean
in-distribution data, which may not always be available due to proprietary
datasets. Model inversion techniques, often considered privacy threats, can
reconstruct realistic training samples, potentially eliminating the need for
in-distribution data. Prior attempts to combine backdoor removal and model
inversion yielded limited results. Our work is the first to provide a thorough
understanding of leveraging model inversion for effective backdoor removal by
addressing key questions about reconstructed samples' properties, perceptual
similarity, and the potential presence of backdoor triggers.
We establish that relying solely on perceptual similarity is insufficient for
robust defenses, and the stability of model predictions in response to input
and parameter perturbations is also crucial. To tackle this, we introduce a
novel bi-level optimization-based framework for model inversion, promoting
stability and visual quality. Interestingly, we discover that reconstructed
samples from a pre-trained generator's latent space are backdoor-free, even
when utilizing signals from a backdoored model. We provide a theoretical
analysis to support this finding. Our evaluation demonstrates that our
stabilized model inversion technique achieves state-of-the-art backdoor removal
performance without clean in-distribution data, matching or surpassing
performance using the same amount of clean samples.
Related papers
- REFINE: Inversion-Free Backdoor Defense via Model Reprogramming [60.554146386198376]
Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat.
We propose REFINE, an inversion-free backdoor defense method based on model reprogramming.
arXiv Detail & Related papers (2025-02-22T07:29:12Z) - TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors [36.07978634674072]
Diffusion models are vulnerable to backdoor attacks that compromise their integrity.
We propose TERD, a backdoor defense framework that builds unified modeling for current attacks.
TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions.
arXiv Detail & Related papers (2024-09-09T03:02:16Z) - Prototype Clustered Diffusion Models for Versatile Inverse Problems [11.55838697574475]
We show that the measurement-based likelihood can be renovated with restoration-based likelihood via the opposite probabilistic graphic direction.
We can resolve inverse problems with bunch of choices for assorted sample quality and realize the proficient deterioration control with assured realistic.
arXiv Detail & Related papers (2024-07-13T04:24:53Z) - Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models.
We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Leveraging Diffusion-Based Image Variations for Robust Training on
Poisoned Data [26.551317580666353]
Backdoor attacks pose a serious security threat for training neural networks.
We propose a novel approach that enables model training on potentially poisoned datasets by utilizing the power of recent diffusion models.
arXiv Detail & Related papers (2023-10-10T07:25:06Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.