Evaluating the Robustness of Geometry-Aware Instance-Reweighted
Adversarial Training
- URL: http://arxiv.org/abs/2103.01914v1
- Date: Tue, 2 Mar 2021 18:15:42 GMT
- Title: Evaluating the Robustness of Geometry-Aware Instance-Reweighted
Adversarial Training
- Authors: Dorjan Hitaj, Giulio Pagnotta, Iacopo Masi, Luigi V. Mancini
- Abstract summary: We evaluate the robustness of a method called "Geometry-aware Instance-reweighted Adversarial Training"
We find that a network trained with this method is biasing the model towards certain samples by re-scaling the loss.
- Score: 9.351384969104771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this technical report, we evaluate the adversarial robustness of a very
recent method called "Geometry-aware Instance-reweighted Adversarial
Training"[7]. GAIRAT reports state-of-the-art results on defenses to
adversarial attacks on the CIFAR-10 dataset. In fact, we find that a network
trained with this method, while showing an improvement over regular adversarial
training (AT), is biasing the model towards certain samples by re-scaling the
loss. Indeed, this leads the model to be susceptible to attacks that scale the
logits. The original model shows an accuracy of 59% under AutoAttack - when
trained with additional data with pseudo-labels. We provide an analysis that
shows the opposite. In particular, we craft a PGD attack multiplying the logits
by a positive scalar that decreases the GAIRAT accuracy from from 55% to 44%,
when trained solely on CIFAR-10. In this report, we rigorously evaluate the
model and provide insights into the reasons behind the vulnerability of GAIRAT
to this adversarial attack. We will release the code promptly to enable the
reproducibility of our findings.
Related papers
- GReAT: A Graph Regularized Adversarial Training Method [0.0]
GReAT (Graph Regularized Adversarial Training) is a novel regularization method designed to enhance the robust classification performance of deep learning models.
GReAT integrates graph based regularization into the adversarial training process, leveraging the data's inherent structure to enhance model robustness.
arXiv Detail & Related papers (2023-10-09T01:44:06Z) - Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - DAD: Data-free Adversarial Defense at Test Time [21.741026088202126]
Deep models are highly susceptible to adversarial attacks.
Privacy has become an important concern, restricting access to only trained models but not the training data.
We propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'
arXiv Detail & Related papers (2022-04-04T15:16:13Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence.
We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z) - Lagrangian Objective Function Leads to Improved Unforeseen Attack
Generalization in Adversarial Training [0.0]
Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training.
We propose a simple modification to the AT that mitigates the mentioned issue.
We show that our attack is faster than other attack schemes that are designed for unseen attack generalization.
arXiv Detail & Related papers (2021-03-29T07:23:46Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - To be Robust or to be Fair: Towards Fairness in Adversarial Training [83.42241071662897]
We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data.
We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
arXiv Detail & Related papers (2020-10-13T02:21:54Z) - Label Smoothing and Adversarial Robustness [16.804200102767208]
We find that training model with label smoothing can easily achieve striking accuracy under most gradient-based attacks.
Our study enlightens the research community to rethink how to evaluate the model's robustness appropriately.
arXiv Detail & Related papers (2020-09-17T12:36:35Z) - Adversarial Detection and Correction by Matching Prediction
Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST.
We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.