Effective Targeted Attacks for Adversarial Self-Supervised Learning
- URL: http://arxiv.org/abs/2210.10482v2
- Date: Thu, 26 Oct 2023 09:18:23 GMT
- Title: Effective Targeted Attacks for Adversarial Self-Supervised Learning
- Authors: Minseon Kim, Hyeonjeong Ha, Sooel Son, Sung Ju Hwang
- Abstract summary: unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
- Score: 58.14233572578723
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, unsupervised adversarial training (AT) has been highlighted as a
means of achieving robustness in models without any label information. Previous
studies in unsupervised AT have mostly focused on implementing self-supervised
learning (SSL) frameworks, which maximize the instance-wise classification loss
to generate adversarial examples. However, we observe that simply maximizing
the self-supervised training loss with an untargeted adversarial attack often
results in generating ineffective adversaries that may not help improve the
robustness of the trained model, especially for non-contrastive SSL frameworks
without negative examples. To tackle this problem, we propose a novel positive
mining for targeted adversarial attack to generate effective adversaries for
adversarial SSL frameworks. Specifically, we introduce an algorithm that
selects the most confusing yet similar target example for a given instance
based on entropy and similarity, and subsequently perturbs the given instance
towards the selected target. Our method demonstrates significant enhancements
in robustness when applied to non-contrastive SSL frameworks, and less but
consistent robustness improvements with contrastive SSL frameworks, on the
benchmark datasets.
Related papers
- Discriminative Adversarial Unlearning [40.30974185546541]
We introduce a novel machine unlearning framework founded upon the established principles of the min-max optimization paradigm.
We capitalize on the capabilities of strong Membership Inference Attacks (MIA) to facilitate the unlearning of specific samples from a trained model.
Our proposed algorithm closely approximates the ideal benchmark of retraining from scratch for both random sample forgetting and class-wise forgetting schemes.
arXiv Detail & Related papers (2024-02-10T03:04:57Z) - Efficient Availability Attacks against Supervised and Contrastive
Learning Simultaneously [26.018467038778006]
We propose contrastive-like data augmentations in supervised error minimization or frameworks to obtain attacks effective for both SL and CL.
Our proposed AUE and AAP attacks achieve state-of-the-art worst-case unlearnability across SL and CL algorithms with less consumption, showcasing prospects in real-world applications.
arXiv Detail & Related papers (2024-02-06T14:05:05Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Robustness through Cognitive Dissociation Mitigation in Contrastive
Adversarial Training [2.538209532048867]
We introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks.
We propose to improve model robustness to adversarial attacks by learning feature representations consistent under both data augmentations and adversarial perturbations.
We validate our method on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
arXiv Detail & Related papers (2022-03-16T21:41:27Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.