LipShiFT: A Certifiably Robust Shift-based Vision Transformer
- URL: http://arxiv.org/abs/2503.14751v1
- Date: Tue, 18 Mar 2025 21:38:18 GMT
- Title: LipShiFT: A Certifiably Robust Shift-based Vision Transformer
- Authors: Rohan Menon, Nicola Franco, Stephan Günnemann,
- Abstract summary: Lipschitz-based margin training acts as a strong regularizer while restricting weights in successive layers of the model.<n>We provide an upper bound estimate for the Lipschitz constants of this model using the $l$ norm on common image classification.
- Score: 46.7028906678548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deriving tight Lipschitz bounds for transformer-based architectures presents a significant challenge. The large input sizes and high-dimensional attention modules typically prove to be crucial bottlenecks during the training process and leads to sub-optimal results. Our research highlights practical constraints of these methods in vision tasks. We find that Lipschitz-based margin training acts as a strong regularizer while restricting weights in successive layers of the model. Focusing on a Lipschitz continuous variant of the ShiftViT model, we address significant training challenges for transformer-based architectures under norm-constrained input setting. We provide an upper bound estimate for the Lipschitz constants of this model using the $l_2$ norm on common image classification datasets. Ultimately, we demonstrate that our method scales to larger models and advances the state-of-the-art in certified robustness for transformer-based architectures.
Related papers
- Revisiting Kernel Attention with Correlated Gaussian Process Representation [6.857174439487293]
We propose a new class of transformers whose self-attention units are modeled as cross-covariance between two correlated GPs (CGPs)<n>This allows asymmetries in attention and can enhance the representation capacity of GP-based transformers.<n>Our empirical studies show that both CGP-based and sparse CGP-based transformers achieve better performance than state-of-the-art GP-based transformers.
arXiv Detail & Related papers (2025-02-27T21:21:48Z) - Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model.
Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Linearizing Large Language Models [26.94551511277412]
We present a method to uptrain existing large pre-trained transformers into Recurrent Neural Networks (RNNs) with a modest compute budget.
We find that our linearization technique leads to competitive performance on standard benchmarks, but we identify persistent in-context learning and long-context modeling shortfalls for even the largest linear models.
arXiv Detail & Related papers (2024-05-10T17:59:08Z) - Converting Transformers to Polynomial Form for Secure Inference Over
Homomorphic Encryption [45.00129952368691]
Homomorphic Encryption (HE) has emerged as one of the most promising approaches in deep learning.
We introduce the first transformer, providing the first demonstration of secure inference over HE with transformers.
Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications.
arXiv Detail & Related papers (2023-11-15T00:23:58Z) - Client: Cross-variable Linear Integrated Enhanced Transformer for
Multivariate Long-Term Time Series Forecasting [4.004869317957185]
"Cross-variable Linear Integrated ENhanced Transformer for Multivariable Long-Term Time Series Forecasting" (Client) is an advanced model that outperforms both traditional Transformer-based models and linear models.
Client incorporates non-linearity and cross-variable dependencies, which sets it apart from conventional linear models and Transformer-based models.
arXiv Detail & Related papers (2023-05-30T08:31:22Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Decision Transformer: Reinforcement Learning via Sequence Modeling [102.86873656751489]
We present a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem.
We present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
arXiv Detail & Related papers (2021-06-02T17:53:39Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.