Related papers: One-shot Neural Backdoor Erasing via Adversarial Weight Masking

One-shot Neural Backdoor Erasing via Adversarial Weight Masking

URL: http://arxiv.org/abs/2207.04497v1
Date: Sun, 10 Jul 2022 16:18:39 GMT
Title: One-shot Neural Backdoor Erasing via Adversarial Weight Masking
Authors: Shuwen Chai and Jinghui Chen
Abstract summary: Adversarial Weight Masking (AWM) is a novel method capable of erasing the neural backdoors even in the one-shot setting. AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.
Score: 8.345632941376673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies show that despite achieving high accuracy on a number of real-world applications, deep neural networks (DNNs) can be backdoored: by injecting triggered data samples into the training dataset, the adversary can mislead the trained model into classifying any test data to the target class as long as the trigger pattern is presented. To nullify such backdoor threats, various methods have been proposed. Particularly, a line of research aims to purify the potentially compromised model. However, one major limitation of this line of work is the requirement to access sufficient original training data: the purifying performance is a lot worse when the available training data is limited. In this work, we propose Adversarial Weight Masking (AWM), a novel method capable of erasing the neural backdoors even in the one-shot setting. The key idea behind our method is to formulate this into a min-max optimization problem: first, adversarially recover the trigger patterns and then (soft) mask the network weights that are sensitive to the recovered patterns. Comprehensive evaluations of several benchmark datasets suggest that AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.

Related papers

PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks [3.2845881871629095]
It is desirable to remove (a.k.a. unlearn) a specific part of the training data from a trained neural network model.<n>Existing unlearning methods involve training alternative models with remaining data.<n>We propose a novel unlearning approach by imposing carefully crafted "patch" on the original neural network to achieve targeted "forgetting" of the requested data to delete.
arXiv Detail & Related papers (2025-05-10T05:35:08Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection. We design a forgery-style mixture formulation that augments the diversity of forgery source domains. We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Augmented Neural Fine-Tuning for Efficient Backdoor Purification [16.74156528484354]
Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks. We propose Neural mask Fine-Tuning (NFT) with an aim to optimally re-organize the neuron activities. NFT relaxes the trigger synthesis process and eliminates the requirement of the adversarial search module.
arXiv Detail & Related papers (2024-07-14T02:36:54Z)
TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural Networks [3.489779105594534]
We introduce a novel approach to backdoor detection using two tensor decomposition methods applied to network activations. This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time. Results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
arXiv Detail & Related papers (2024-01-06T03:08:28Z)
A Data-Centric Approach for Improving Adversarial Training Through the Lens of Out-of-Distribution Detection [0.4893345190925178]
We propose detecting and removing hard samples directly from the training procedure rather than applying complicated algorithms to mitigate their effects. Our results on SVHN and CIFAR-10 datasets show the effectiveness of this method in improving the adversarial training without adding too much computational cost.
arXiv Detail & Related papers (2023-01-25T08:13:50Z)
Adversarial training with informed data selection [53.19381941131439]
Adrial training is the most efficient solution to defend the network against these malicious attacks. This work proposes a data selection strategy to be applied in the mini-batch training. The simulation results show that a good compromise can be obtained regarding robustness and standard accuracy.
arXiv Detail & Related papers (2023-01-07T12:09:50Z)
One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks [28.502489028888608]
Unlearnable examples (ULEs) aim to protect data from unauthorized usage for training DNNs. In adversarial training, the unlearnability of error-minimizing noise will severely degrade. We propose a novel model-free method, named emphOne-Pixel Shortcut, which only perturbs a single pixel of each image and makes the dataset unlearnable.
arXiv Detail & Related papers (2022-05-24T15:17:52Z)
A Deep Marginal-Contrastive Defense against Adversarial Attacks on 1D Models [3.9962751777898955]
Deep learning algorithms have been recently targeted by attackers due to their vulnerability. Non-continuous deep models are still not robust against adversarial attacks. We propose a novel objective/loss function, which enforces the features to lie under a specified margin to facilitate their prediction.
arXiv Detail & Related papers (2020-12-08T20:51:43Z)
Attribute-Guided Adversarial Training for Robustness to Natural Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.