Discriminative Adversarial Unlearning
- URL: http://arxiv.org/abs/2402.06864v2
- Date: Tue, 13 Feb 2024 06:14:21 GMT
- Title: Discriminative Adversarial Unlearning
- Authors: Rohan Sharma, Shijie Zhou, Kaiyi Ji and Changyou Chen
- Abstract summary: We introduce a novel machine unlearning framework founded upon the established principles of the min-max optimization paradigm.
We capitalize on the capabilities of strong Membership Inference Attacks (MIA) to facilitate the unlearning of specific samples from a trained model.
Our proposed algorithm closely approximates the ideal benchmark of retraining from scratch for both random sample forgetting and class-wise forgetting schemes.
- Score: 40.30974185546541
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel machine unlearning framework founded upon the
established principles of the min-max optimization paradigm. We capitalize on
the capabilities of strong Membership Inference Attacks (MIA) to facilitate the
unlearning of specific samples from a trained model. We consider the scenario
of two networks, the attacker $\mathbf{A}$ and the trained defender
$\mathbf{D}$ pitted against each other in an adversarial objective, wherein the
attacker aims at teasing out the information of the data to be unlearned in
order to infer membership, and the defender unlearns to defend the network
against the attack, whilst preserving its general performance. The algorithm
can be trained end-to-end using backpropagation, following the well known
iterative min-max approach in updating the attacker and the defender. We
additionally incorporate a self-supervised objective effectively addressing the
feature space discrepancies between the forget set and the validation set,
enhancing unlearning performance. Our proposed algorithm closely approximates
the ideal benchmark of retraining from scratch for both random sample
forgetting and class-wise forgetting schemes on standard machine-unlearning
datasets. Specifically, on the class unlearning scheme, the method demonstrates
near-optimal performance and comprehensively overcomes known methods over the
random sample forgetting scheme across all metrics and multiple network pruning
strategies.
Related papers
- Adversarial Machine Unlearning [26.809123658470693]
This paper focuses on the challenge of machine unlearning, aiming to remove the influence of specific training data on machine learning models.
Traditionally, the development of unlearning algorithms runs parallel with that of membership inference attacks (MIA), a type of privacy threat.
We propose a game-theoretic framework that integrates MIAs into the design of unlearning algorithms.
arXiv Detail & Related papers (2024-06-11T20:07:22Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Class-Aware Domain Adaptation for Improving Adversarial Robustness [27.24720754239852]
adversarial training has been proposed to train networks by injecting adversarial examples into the training data.
We propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training.
arXiv Detail & Related papers (2020-05-10T03:45:19Z) - Feature Partitioning for Robust Tree Ensembles and their Certification
in Adversarial Scenarios [8.300942601020266]
We focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at test time.
We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset.
Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker.
arXiv Detail & Related papers (2020-04-07T12:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.