AGKD-BML: Defense Against Adversarial Attack by Attention Guided
Knowledge Distillation and Bi-directional Metric Learning
- URL: http://arxiv.org/abs/2108.06017v1
- Date: Fri, 13 Aug 2021 01:25:04 GMT
- Title: AGKD-BML: Defense Against Adversarial Attack by Attention Guided
Knowledge Distillation and Bi-directional Metric Learning
- Authors: Hong Wang, Yuefan Deng, Shinjae Yoo, Haibin Ling, Yuewei Lin
- Abstract summary: We propose a novel adversarial training-based model by Attention Guided Knowledge Distillation and Bi-directional Metric Learning (AGKD-BML)
Our proposed AGKD-BML model consistently outperforms the state-of-the-art approaches.
- Score: 61.8003954296545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While deep neural networks have shown impressive performance in many tasks,
they are fragile to carefully designed adversarial attacks. We propose a novel
adversarial training-based model by Attention Guided Knowledge Distillation and
Bi-directional Metric Learning (AGKD-BML). The attention knowledge is obtained
from a weight-fixed model trained on a clean dataset, referred to as a teacher
model, and transferred to a model that is under training on adversarial
examples (AEs), referred to as a student model. In this way, the student model
is able to focus on the correct region, as well as correcting the intermediate
features corrupted by AEs to eventually improve the model accuracy. Moreover,
to efficiently regularize the representation in feature space, we propose a
bidirectional metric learning. Specifically, given a clean image, it is first
attacked to its most confusing class to get the forward AE. A clean image in
the most confusing class is then randomly picked and attacked back to the
original class to get the backward AE. A triplet loss is then used to shorten
the representation distance between original image and its AE, while enlarge
that between the forward and backward AEs. We conduct extensive adversarial
robustness experiments on two widely used datasets with different attacks. Our
proposed AGKD-BML model consistently outperforms the state-of-the-art
approaches. The code of AGKD-BML will be available at:
https://github.com/hongw579/AGKD-BML.
Related papers
- FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification [10.911464455072391]
FACTUAL is a Contrastive Learning framework for Adversarial Training and robust SAR classification.
Our model achieves 99.7% accuracy on clean samples, and 89.6% on perturbed samples, both outperforming previous state-of-the-art methods.
arXiv Detail & Related papers (2024-04-04T06:20:22Z) - Inference Attacks Against Face Recognition Model without Classification
Layers [2.775761045299829]
Face recognition (FR) has been applied to nearly every aspect of daily life, but it is always accompanied by the underlying risk of leaking private information.
In this work, we advocate a novel inference attack composed of two stages for practical FR models without a classification layer.
arXiv Detail & Related papers (2024-01-24T09:51:03Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Anomaly Detection via Multi-Scale Contrasted Memory [3.0170109896527086]
We introduce a new two-stage anomaly detector which memorizes during training multi-scale normal prototypes to compute an anomaly deviation score.
Our model highly improves the state-of-the-art performance on a wide range of object, style and local anomalies with up to 35% error relative improvement on CIFAR-10.
arXiv Detail & Related papers (2022-11-16T16:58:04Z) - Semi-Targeted Model Poisoning Attack on Federated Learning via Backward
Error Analysis [15.172954465350667]
Model poisoning attacks on federated learning (FL) intrude in the entire system via compromising an edge model.
We propose the Attacking Distance-aware Attack (ADA) to enhance a poisoning attack by finding the optimized target class in the feature space.
ADA succeeded in increasing the attack performance by 1.8 times in the most challenging case with an attacking frequency of 0.01.
arXiv Detail & Related papers (2022-03-22T11:40:07Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - Adversarial Attacks on Knowledge Graph Embeddings via Instance
Attribution Methods [8.793721044482613]
We study data poisoning attacks against Knowledge Graph Embeddings (KGE) models for link prediction.
These attacks craft adversarial additions or deletions at training time to cause model failure at test time.
We propose a method to replace one of the two entities in each influential triple to generate adversarial additions.
arXiv Detail & Related papers (2021-11-04T19:38:48Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.