Related papers: Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

URL: http://arxiv.org/abs/2411.05189v1
Date: Thu, 07 Nov 2024 21:25:58 GMT
Title: Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
Authors: Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei,
Abstract summary: This work investigates the vulnerability of in-context learning in transformers to textithijacking attacks focusing on the setting of linear regression tasks. We first prove that single-layer linear transformers, known to implement gradient descent in-context, are non-robust and can be manipulated to output arbitrary predictions. We then demonstrate that adversarial training enhances transformers' robustness against hijacking attacks, even when just applied during finetuning.
Score: 23.737606860443705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement common learning algorithms, the adversarial robustness of these learned algorithms remains unexplored. This work investigates the vulnerability of in-context learning in transformers to \textit{hijacking attacks} focusing on the setting of linear regression tasks. Hijacking attacks are prompt-manipulation attacks in which the adversary's goal is to manipulate the prompt to force the transformer to generate a specific output. We first prove that single-layer linear transformers, known to implement gradient descent in-context, are non-robust and can be manipulated to output arbitrary predictions by perturbing a single example in the in-context training set. While our experiments show these attacks succeed on linear transformers, we find they do not transfer to more complex transformers with GPT-2 architectures. Nonetheless, we show that these transformers can be hijacked using gradient-based adversarial attacks. We then demonstrate that adversarial training enhances transformers' robustness against hijacking attacks, even when just applied during finetuning. Additionally, we find that in some settings, adversarial training against a weaker attack model can lead to robustness to a stronger attack model. Lastly, we investigate the transferability of hijacking attacks across transformers of varying scales and initialization seeds, as well as between transformers and ordinary least squares (OLS). We find that while attacks transfer effectively between small-scale transformers, they show poor transferability in other scenarios (small-to-large scale, large-to-large scale, and between transformers and OLS).

Related papers

Transformer Learns Optimal Variable Selection in Group-Sparse Classification [14.760685658938787]
We give a case study of how transformers can be trained to learn a classic statistical model with "group sparsity" We theoretically demonstrate that, a one-layer transformer trained by gradient descent can correctly leverage the attention mechanism to select variables. We also demonstrate that a well-pretrained one-layer transformer can be adapted to new downstream tasks to achieve good prediction accuracy with a limited number of samples.
arXiv Detail & Related papers (2025-04-11T15:39:44Z)
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context [48.4979348643494]
We study the capability of one-layer transformers learning the one-nearest neighbor rule. A single softmax attention layer can successfully learn to behave like a one-nearest neighbor.
arXiv Detail & Related papers (2024-11-16T16:12:42Z)
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks. We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z)
The Efficacy of Transformer-based Adversarial Attacks in Security Domains [0.7156877824959499]
We evaluate the robustness of transformers to adversarial samples for system defenders and their adversarial strength for system attackers. Our work emphasizes the importance of studying transformer architectures for attacking and defending models in security domains.
arXiv Detail & Related papers (2023-10-17T21:45:23Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
DBIA: Data-free Backdoor Injection Attack against Transformer Networks [6.969019759456717]
We propose DBIA, a data-free backdoor attack against the CV-oriented transformer networks. Our approach can embed backdoors with a high success rate and a low impact on the performance of the victim transformers.
arXiv Detail & Related papers (2021-11-22T08:13:51Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.