Robustness Verification for Transformers
- URL: http://arxiv.org/abs/2002.06622v2
- Date: Wed, 23 Dec 2020 12:36:47 GMT
- Title: Robustness Verification for Transformers
- Authors: Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh
- Abstract summary: We develop the first robustness verification algorithm for Transformers.
The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound propagation.
These bounds also shed light on interpreting Transformers as they consistently reflect the importance of different words in sentiment analysis.
- Score: 165.25112192811764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness verification that aims to formally certify the prediction behavior
of neural networks has become an important tool for understanding model
behavior and obtaining safety guarantees. However, previous methods can usually
only handle neural networks with relatively simple architectures. In this
paper, we consider the robustness verification problem for Transformers.
Transformers have complex self-attention layers that pose many challenges for
verification, including cross-nonlinearity and cross-position dependency, which
have not been discussed in previous works. We resolve these challenges and
develop the first robustness verification algorithm for Transformers. The
certified robustness bounds computed by our method are significantly tighter
than those by naive Interval Bound Propagation. These bounds also shed light on
interpreting Transformers as they consistently reflect the importance of
different words in sentiment analysis.
Related papers
- Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - GSmooth: Certified Robustness against Semantic Transformations via
Generalized Randomized Smoothing [40.38555458216436]
We propose a unified theoretical framework for certifying robustness against general semantic transformations.
Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation.
arXiv Detail & Related papers (2022-06-09T07:12:17Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Translational Equivariance in Kernelizable Attention [3.236198583140341]
We show how translational equivariance can be implemented in efficient Transformers based on kernelizable attention.
Our experiments highlight that the devised approach significantly improves robustness of Performers to shifts of input images.
arXiv Detail & Related papers (2021-02-15T17:14:15Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z) - Masked Language Modeling for Proteins via Linearly Scalable Long-Context
Transformers [42.93754828584075]
We present a new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR)
Our mechanism scales linearly rather than quadratically in the number of tokens in the sequence, is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors.
It provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.
arXiv Detail & Related papers (2020-06-05T17:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.