Related papers: Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training

Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training

URL: http://arxiv.org/abs/2103.01914v1
Date: Tue, 2 Mar 2021 18:15:42 GMT
Title: Evaluating the Robustness of Geometry-Aware Instance-Reweighted Adversarial Training
Authors: Dorjan Hitaj, Giulio Pagnotta, Iacopo Masi, Luigi V. Mancini
Abstract summary: We evaluate the robustness of a method called "Geometry-aware Instance-reweighted Adversarial Training" We find that a network trained with this method is biasing the model towards certain samples by re-scaling the loss.
Score: 9.351384969104771
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this technical report, we evaluate the adversarial robustness of a very recent method called "Geometry-aware Instance-reweighted Adversarial Training"[7]. GAIRAT reports state-of-the-art results on defenses to adversarial attacks on the CIFAR-10 dataset. In fact, we find that a network trained with this method, while showing an improvement over regular adversarial training (AT), is biasing the model towards certain samples by re-scaling the loss. Indeed, this leads the model to be susceptible to attacks that scale the logits. The original model shows an accuracy of 59% under AutoAttack - when trained with additional data with pseudo-labels. We provide an analysis that shows the opposite. In particular, we craft a PGD attack multiplying the logits by a positive scalar that decreases the GAIRAT accuracy from from 55% to 44%, when trained solely on CIFAR-10. In this report, we rigorously evaluate the model and provide insights into the reasons behind the vulnerability of GAIRAT to this adversarial attack. We will release the code promptly to enable the reproducibility of our findings.

Related papers

Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models. We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z)
Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods [6.902279764206365]
We propose a novel approach to identify the at-risk samples using only artifacts available during training. Our method analyzes individual per-sample loss traces and uses them to identify the vulnerable data samples.
arXiv Detail & Related papers (2024-11-08T18:04:41Z)
GReAT: A Graph Regularized Adversarial Training Method [0.0]
GReAT (Graph Regularized Adversarial Training) is a novel regularization method designed to enhance the robust classification performance of deep learning models. GReAT integrates graph based regularization into the adversarial training process, leveraging the data's inherent structure to enhance model robustness.
arXiv Detail & Related papers (2023-10-09T01:44:06Z)
Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server. Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples. We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z)
On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications. We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency. Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z)
DAD: Data-free Adversarial Defense at Test Time [21.741026088202126]
Deep models are highly susceptible to adversarial attacks. Privacy has become an important concern, restricting access to only trained models but not the training data. We propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'
arXiv Detail & Related papers (2022-04-04T15:16:13Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z)
Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training [0.0]
Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training. We propose a simple modification to the AT that mitigates the mentioned issue. We show that our attack is faster than other attack schemes that are designed for unseen attack generalization.
arXiv Detail & Related papers (2021-03-29T07:23:46Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
To be Robust or to be Fair: Towards Fairness in Adversarial Training [83.42241071662897]
We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
arXiv Detail & Related papers (2020-10-13T02:21:54Z)
Label Smoothing and Adversarial Robustness [16.804200102767208]
We find that training model with label smoothing can easily achieve striking accuracy under most gradient-based attacks. Our study enlightens the research community to rethink how to evaluate the model's robustness appropriately.
arXiv Detail & Related papers (2020-09-17T12:36:35Z)
Adversarial Detection and Correction by Matching Prediction Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.