Certified Patch Robustness via Smoothed Vision Transformers
- URL: http://arxiv.org/abs/2110.07719v1
- Date: Mon, 11 Oct 2021 17:44:05 GMT
- Title: Certified Patch Robustness via Smoothed Vision Transformers
- Authors: Hadi Salman, Saachi Jain, Eric Wong, Aleksander M\k{a}dry
- Abstract summary: We show how using vision transformers enables significantly better certified patch robustness.
These improvements stem from the inherent ability of the vision transformer to gracefully handle largely masked images.
- Score: 77.30663719482924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Certified patch defenses can guarantee robustness of an image classifier to
arbitrary changes within a bounded contiguous region. But, currently, this
robustness comes at a cost of degraded standard accuracies and slower inference
times. We demonstrate how using vision transformers enables significantly
better certified patch robustness that is also more computationally efficient
and does not incur a substantial drop in standard accuracy. These improvements
stem from the inherent ability of the vision transformer to gracefully handle
largely masked images. Our code is available at
https://github.com/MadryLab/smoothed-vit.
Related papers
- Dynamic Grained Encoder for Vision Transformers [150.02797954201424]
This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images.
We propose a Dynamic Grained for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.
Our encoder allows the state-of-the-art vision transformers to reduce computational complexity by 40%-60% while maintaining comparable performance on image classification.
arXiv Detail & Related papers (2023-01-10T07:55:29Z) - Efficient Attention-free Video Shift Transformers [56.87581500474093]
This paper tackles the problem of efficient video recognition.
Video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum.
We extend our formulation in the video domain to construct Video Affine-Shift Transformer.
arXiv Detail & Related papers (2022-08-23T17:48:29Z) - Vicinity Vision Transformer [53.43198716947792]
We present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
Our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.
arXiv Detail & Related papers (2022-06-21T17:33:53Z) - Towards Practical Certifiable Patch Defense with Vision Transformer [34.00374565048962]
We introduce Vision Transformer (ViT) into the framework of Derandomized Smoothing (DS)
For efficient inference and deployment in the real world, we innovatively reconstruct the global self-attention structure of the original ViT into isolated band unit self-attention.
arXiv Detail & Related papers (2022-03-16T10:39:18Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - PatchCensor: Patch Robustness Certification for Transformers via
Exhaustive Testing [7.88628640954152]
Vision Transformer (ViT) is known to be highly nonlinear like other classical neural networks and could be easily fooled by both natural and adversarial patch perturbations.
This limitation could pose a threat to the deployment of ViT in the real industrial environment, especially in safety-critical scenarios.
We propose PatchCensor, aiming to certify the patch robustness of ViT by applying exhaustive testing.
arXiv Detail & Related papers (2021-11-19T23:45:23Z) - Understanding and Improving Robustness of Vision Transformers through
Patch-based Negative Augmentation [29.08732248577141]
We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure.
We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics.
We show that patch-based negative augmentation consistently improves robustness of ViTs across a wide set of ImageNet based robustness benchmarks.
arXiv Detail & Related papers (2021-10-15T04:53:18Z) - Exploring and Improving Mobile Level Vision Transformers [81.7741384218121]
We study the vision transformer structure in the mobile level in this paper, and find a dramatic performance drop.
We propose a novel irregular patch embedding module and adaptive patch fusion module to improve the performance.
arXiv Detail & Related papers (2021-08-30T06:42:49Z) - Improve Vision Transformers Training by Suppressing Over-smoothing [28.171262066145612]
Introducing the transformer structure into computer vision tasks holds the promise of yielding a better speed-accuracy trade-off than traditional convolution networks.
However, directly training vanilla transformers on vision tasks has been shown to yield unstable and sub-optimal results.
Recent works propose to modify transformer structures by incorporating convolutional layers to improve the performance on vision tasks.
arXiv Detail & Related papers (2021-04-26T17:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.