Related papers: Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

URL: http://arxiv.org/abs/2303.01870v2
Date: Sat, 28 Oct 2023 16:27:56 GMT
Title: Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models
Authors: Naman D Singh, Francesco Croce, Matthias Hein
Abstract summary: We revisit adversarial training on ImageNet comparing ViTs and ConvNeXts. Our modified ConvNeXt, ConvNeXt + ConvStem, yields the most robust generalizations across different ranges of model parameters. Our ViT + ConvStem yields the best generalization to unseen threat models.
Score: 52.86163536826919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While adversarial training has been extensively studied for ResNet architectures and low resolution datasets like CIFAR, much less is known for ImageNet. Given the recent debate about whether transformers are more robust than convnets, we revisit adversarial training on ImageNet comparing ViTs and ConvNeXts. Extensive experiments show that minor changes in architecture, most notably replacing PatchStem with ConvStem, and training scheme have a significant impact on the achieved robustness. These changes not only increase robustness in the seen $\ell_\infty$-threat model, but even more so improve generalization to unseen $\ell_1/\ell_2$-attacks. Our modified ConvNeXt, ConvNeXt + ConvStem, yields the most robust $\ell_\infty$-models across different ranges of model parameters and FLOPs, while our ViT + ConvStem yields the best generalization to unseen threat models.

Related papers

Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness [30.934760041900386]
This paper investigates the robustness of vision-language models against adversarial visual perturbations. We perform large-scale adversarial vision-language pre-training from scratch using web-scale data. We then strengthen the defense by incorporating adversarial visual instruction tuning.
arXiv Detail & Related papers (2025-01-16T10:20:48Z)
On the Adversarial Transferability of Generalized "Skip Connections" [83.71752155227888]
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. We find that using more gradients from the skip connections rather than the residual modules during backpropagation allows one to craft adversarial examples with high transferability. We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models.
arXiv Detail & Related papers (2024-10-11T16:17:47Z)
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy [27.75360812109922]
In this work, we conduct an in-depth comparative analysis of model behaviors beyond ImageNet accuracy. Although our selected models have similar ImageNet accuracies and compute requirements, we find that they differ in many other aspects. This diversity in model characteristics, not captured by traditional metrics, highlights the need for more nuanced analysis.
arXiv Detail & Related papers (2023-11-15T18:56:51Z)
On the unreasonable vulnerability of transformers for image restoration -- and an easy fix [16.927916090724363]
We investigate whether the improved adversarial robustness of ViTs extends to image restoration. We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network" Our experiments are performed on real-world images from the GoPro dataset for image deblurring.
arXiv Detail & Related papers (2023-07-25T23:09:05Z)
A Light Recipe to Train Robust Vision Transformers [34.51642006926379]
We show that Vision Transformers (ViTs) can serve as an underlying architecture for improving the robustness of machine learning models against evasion attacks. We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset. We show that our recipe generalizes to different classes of ViT architectures and large-scale models on full ImageNet-1k.
arXiv Detail & Related papers (2022-09-15T16:00:04Z)
Pyramid Adversarial Training Improves ViT Performance [43.322865996422664]
Pyramid Adversarial Training is a simple and effective technique to improve ViT's overall performance. It leads to $1.82%$ absolute improvement on ImageNet clean accuracy for the ViT-B model when trained only on ImageNet-1K data.
arXiv Detail & Related papers (2021-11-30T04:38:14Z)
Adversarial robustness against multiple $l_p$-threat models at the price of one and how to quickly fine-tune robust models to another threat model [79.05253587566197]
Adrial training (AT) in order to achieve adversarial robustness wrt single $l_p$-threat models has been discussed extensively. In this paper we develop a simple and efficient training scheme to achieve adversarial robustness against the union of $l_p$-threat models.
arXiv Detail & Related papers (2021-05-26T12:20:47Z)
Rethinking the Design Principles of Robust Vision Transformer [28.538786330184642]
Vision Transformers (ViT) have shown that self-attention-based networks surpassed traditional convolution neural networks (CNNs) in most vision tasks. In this paper, we rethink the design principles of ViTs based on the robustness. By combining the robust design components, we propose Robust Vision Transformer (RVT)
arXiv Detail & Related papers (2021-05-17T15:04:15Z)
Contrastive Learning with Stronger Augmentations [63.42057690741711]
We propose a general framework called Contrastive Learning with Stronger Augmentations(A) to complement current contrastive learning approaches. Here, the distribution divergence between the weakly and strongly augmented images over the representation bank is adopted to supervise the retrieval of strongly augmented queries. Experiments showed the information from the strongly augmented images can significantly boost the performance.
arXiv Detail & Related papers (2021-04-15T18:40:04Z)
On the Adversarial Robustness of Visual Transformers [129.29523847765952]
This work provides the first and comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-03-29T14:48:24Z)
Revisiting ResNets: Improved Training and Scaling Strategies [54.0162571976267]
Training and scaling strategies may matter more than architectural changes, and the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime. We design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.
arXiv Detail & Related papers (2021-03-13T00:18:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.