Related papers: A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks

A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks

URL: http://arxiv.org/abs/2106.02105v1
Date: Thu, 3 Jun 2021 19:53:46 GMT
Title: A Little Robustness Goes a Long Way: Leveraging Universal Features for Targeted Transfer Attacks
Authors: Jacob M. Springer, Melanie Mitchell, Garrett T. Kenyon
Abstract summary: We show that training the source classifier to be "slightly robust" substantially improves the transferability of targeted attacks. We argue that this result supports a non-versa hypothesis.
Score: 4.511923587827301
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Adversarial examples for neural network image classifiers are known to be transferable: examples optimized to be misclassified by a source classifier are often misclassified as well by classifiers with different architectures. However, targeted adversarial examples -- optimized to be classified as a chosen target class -- tend to be less transferable between architectures. While prior research on constructing transferable targeted attacks has focused on improving the optimization procedure, in this work we examine the role of the source classifier. Here, we show that training the source classifier to be "slightly robust" -- that is, robust to small-magnitude adversarial examples -- substantially improves the transferability of targeted attacks, even between architectures as different as convolutional neural networks and transformers. We argue that this result supports a non-intuitive hypothesis: on the spectrum from non-robust (standard) to highly robust classifiers, those that are only slightly robust exhibit the most universal features -- ones that tend to overlap with the features learned by other classifiers trained on the same dataset. The results we present provide insight into the nature of adversarial examples as well as the mechanisms underlying so-called "robust" classifiers.

Related papers

MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations. We show that strong feature representation learning during training can significantly enhance the original model's robustness. We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z)
Enhancing cross-domain detection: adaptive class-aware contrastive transformer [15.666766743738531]
Insufficient labels in the target domain exacerbate issues of class imbalance and model performance degradation. We propose a class-aware cross domain detection transformer based on the adversarial learning and mean-teacher framework.
arXiv Detail & Related papers (2024-01-24T07:11:05Z)
Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot Image Classification [61.411869453639845]
We introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. This design effectively helps the model to explore more subtle and discriminative features. Experimental results on three widely used fine-grained image classification datasets consistently show considerable improvements.
arXiv Detail & Related papers (2022-11-30T16:55:14Z)
Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial Attacks via Pairwise Adversarially Robust Loss Function [13.417003144007156]
adversarial attacks tend to rely on the principle of transferability. Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers. Recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation.
arXiv Detail & Related papers (2021-12-09T14:26:13Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
On Generating Transferable Targeted Perturbations [102.3506210331038]
We propose a new generative approach for highly transferable targeted perturbations. Our approach matches the perturbed image distribution' with that of the target class, leading to high targeted transferability rates.
arXiv Detail & Related papers (2021-03-26T17:55:28Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
Beyond cross-entropy: learning highly separable feature distributions for robust and accurate classification [22.806324361016863]
We propose a novel approach for training deep robust multiclass classifiers that provides adversarial robustness. We show that the regularization of the latent space based on our approach yields excellent classification accuracy.
arXiv Detail & Related papers (2020-10-29T11:15:17Z)
TREND: Transferability based Robust ENsemble Design [6.663641564969944]
We study the effect of network architecture, input, weight and activation quantization on transferability of adversarial samples. We show that transferability is significantly hampered by input quantization between source and target. We propose a new state-of-the-art ensemble attack to combat this.
arXiv Detail & Related papers (2020-08-04T13:38:14Z)
A Bayes-Optimal View on Adversarial Examples [9.51828574518325]
We argue for examining adversarial examples from the perspective of Bayes-optimal classification. Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier.
arXiv Detail & Related papers (2020-02-20T16:43:47Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.