Related papers: CertViT: Certified Robustness of Pre-Trained Vision Transformers

CertViT: Certified Robustness of Pre-Trained Vision Transformers

URL: http://arxiv.org/abs/2302.10287v1
Date: Wed, 1 Feb 2023 06:09:19 GMT
Title: CertViT: Certified Robustness of Pre-Trained Vision Transformers
Authors: Kavya Gupta and Sagar Verma
Abstract summary: Lipschitz bounded neural networks are certifiably robust and have a good trade-off between clean and certified accuracy. Existing Lipschitz bounding methods train from scratch and are limited to moderately sized networks. We show that CertViT networks have better certified accuracy than state-of-the-art Lipschitz trained networks.
Score: 11.880271015435582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lipschitz bounded neural networks are certifiably robust and have a good trade-off between clean and certified accuracy. Existing Lipschitz bounding methods train from scratch and are limited to moderately sized networks (< 6M parameters). They require a fair amount of hyper-parameter tuning and are computationally prohibitive for large networks like Vision Transformers (5M to 660M parameters). Obtaining certified robustness of transformers is not feasible due to the non-scalability and inflexibility of the current methods. This work presents CertViT, a two-step proximal-projection method to achieve certified robustness from pre-trained weights. The proximal step tries to lower the Lipschitz bound and the projection step tries to maintain the clean accuracy of pre-trained weights. We show that CertViT networks have better certified accuracy than state-of-the-art Lipschitz trained networks. We apply CertViT on several variants of pre-trained vision transformers and show adversarial robustness using standard attacks. Code : https://github.com/sagarverma/transformer-lipschitz

Related papers

Training Transformers with Enforced Lipschitz Constants [25.42378506132261]
We train neural networks with Lipschitz bounds enforced throughout training.<n>We find that switching from AdamW to Muon improves standard methods.<n>Inspired by Muon's update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff.
arXiv Detail & Related papers (2025-07-17T17:55:00Z)
LipShiFT: A Certifiably Robust Shift-based Vision Transformer [46.7028906678548]
Lipschitz-based margin training acts as a strong regularizer while restricting weights in successive layers of the model. We provide an upper bound estimate for the Lipschitz constants of this model using the $l$ norm on common image classification.
arXiv Detail & Related papers (2025-03-18T21:38:18Z)
Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness [21.394217131341932]
We introduce a novel certifying adapters framework (CAF) that enables and enhances the certification of adversarial robustness. CAF achieves improved certified accuracies when compared to methods based on random or denoised smoothing. An ensemble of adapters enables a single pre-trained feature extractor to defend against a range of noise perturbation scales.
arXiv Detail & Related papers (2024-05-25T03:18:52Z)
PriViT: Vision Transformers for Fast Private Inference [55.36478271911595]
Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. ViTs are ill-suited for private inference using secure multi-party protocols, due to the large number of non-polynomial operations. We propose PriViT, an algorithm to selectively " Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy.
arXiv Detail & Related papers (2023-10-06T21:45:05Z)
LipsFormer: Introducing Lipschitz Continuity to Vision Transformers [15.568629066375971]
We present a Lipschitz continuous Transformer, called LipsFormer, to pursue training stability for Transformer-based models. Our experiments show that LipsFormer allows stable training of deep Transformer architectures without the need of careful learning rate tuning. LipsFormer-CSwin-Tiny, based on CSwin, training for 300 epochs achieves a top-1 accuracy of 83.5% with 4.7G FLOPs and 24M parameters.
arXiv Detail & Related papers (2023-04-19T17:59:39Z)
Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness [33.72713778392896]
We study certified $ell_infty$ from a novel perspective of representing Boolean functions. We develop a unified Lipschitz network that generalizes prior works, and design a practical version that can be efficiently trained.
arXiv Detail & Related papers (2022-10-04T17:55:27Z)
Semi-supervised Vision Transformers at Scale [93.0621675558895]
We study semi-supervised learning (SSL) for vision transformers (ViT) We propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting.
arXiv Detail & Related papers (2022-08-11T08:11:54Z)
Smooth-Reduce: Leveraging Patches for Improved Certified Robustness [100.28947222215463]
We propose a training-free, modified smoothing approach, Smooth-Reduce. Our algorithm classifies overlapping patches extracted from an input image, and aggregates the predicted logits to certify a larger radius around the input. We provide theoretical guarantees for such certificates, and empirically show significant improvements over other randomized smoothing methods.
arXiv Detail & Related papers (2022-05-12T15:26:20Z)
Towards Practical Certifiable Patch Defense with Vision Transformer [34.00374565048962]
We introduce Vision Transformer (ViT) into the framework of Derandomized Smoothing (DS) For efficient inference and deployment in the real world, we innovatively reconstruct the global self-attention structure of the original ViT into isolated band unit self-attention.
arXiv Detail & Related papers (2022-03-16T10:39:18Z)
ViT-P: Rethinking Data-efficient Vision Transformers from Locality [9.515925867530262]
We make vision transformers as data-efficient as convolutional neural networks by introducing multi-focal attention bias. Inspired by the attention distance in a well-trained ViT, we constrain the self-attention of ViT to have multi-scale localized receptive field. On Cifar100, our ViT-P Base model achieves the state-of-the-art accuracy (83.16%) trained from scratch.
arXiv Detail & Related papers (2022-03-04T14:49:48Z)
Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds [99.23098204458336]
Certified robustness is a desirable property for deep neural networks in safety-critical applications. We show that our method consistently outperforms state-of-the-art methods on MNIST and TinyNet datasets.
arXiv Detail & Related papers (2021-11-02T06:44:10Z)
Certified Patch Robustness via Smoothed Vision Transformers [77.30663719482924]
We show how using vision transformers enables significantly better certified patch robustness. These improvements stem from the inherent ability of the vision transformer to gracefully handle largely masked images.
arXiv Detail & Related papers (2021-10-11T17:44:05Z)
Second-Order Provable Defenses against Adversarial Attacks [63.34032156196848]
We show that if the eigenvalues of the network are bounded, we can compute a certificate in the $l$ norm efficiently using convex optimization. We achieve certified accuracy of 5.78%, and 44.96%, and 43.19% on 2,59% and 4BP-based methods respectively.
arXiv Detail & Related papers (2020-06-01T05:55:18Z)
Robustness Verification for Transformers [165.25112192811764]
We develop the first robustness verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound propagation. These bounds also shed light on interpreting Transformers as they consistently reflect the importance of different words in sentiment analysis.
arXiv Detail & Related papers (2020-02-16T17:16:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.