Prompting a Pretrained Transformer Can Be a Universal Approximator
- URL: http://arxiv.org/abs/2402.14753v1
- Date: Thu, 22 Feb 2024 18:12:48 GMT
- Title: Prompting a Pretrained Transformer Can Be a Universal Approximator
- Authors: Aleksandar Petrov, Philip H.S. Torr, Adel Bibi
- Abstract summary: We show that much smaller pretrained models than previously thought can be universal approximators when prefixed.
We also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.
- Score: 105.59562522323274
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the widespread adoption of prompting, prompt tuning and prefix-tuning
of transformer models, our theoretical understanding of these fine-tuning
methods remains limited. A key question is whether one can arbitrarily modify
the behavior of pretrained model by prompting or prefix-tuning it. Formally,
whether prompting and prefix-tuning a pretrained model can universally
approximate sequence-to-sequence functions. This paper answers in the
affirmative and demonstrates that much smaller pretrained models than
previously thought can be universal approximators when prefixed. In fact, the
attention mechanism is uniquely suited for universal approximation with
prefix-tuning a single attention head being sufficient to approximate any
continuous function. Moreover, any sequence-to-sequence function can be
approximated by prefixing a transformer with depth linear in the sequence
length. Beyond these density-type results, we also offer Jackson-type bounds on
the length of the prefix needed to approximate a function to a desired
precision.
Related papers
- Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers [0.0]
We study elementary edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer.
We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic.
arXiv Detail & Related papers (2024-10-17T17:39:46Z) - Transformers As Approximations of Solomonoff Induction [7.890110890837779]
Solomonoff Induction is an optimal-in-the-limit algorithm for sequence prediction.
Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared.
We put forth and explore the hypothesis that Transformer models approximate Solomonoff Induction better than any other extant sequence prediction method.
arXiv Detail & Related papers (2024-08-22T02:05:44Z) - Universality and Limitations of Prompt Tuning [65.8354898840308]
We take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures.
We analyze prompt tuning from the lens of universality and limitations with finite-depth pretrained transformers for continuous-valued functions.
Our result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions.
arXiv Detail & Related papers (2023-05-30T06:47:07Z) - Sampled Transformer for Point Sets [80.66097006145999]
sparse transformer can reduce the computational complexity of the self-attention layers to $O(n)$, whilst still being a universal approximator of continuous sequence-to-sequence functions.
We propose an $O(n)$ complexity sampled transformer that can process point set elements directly without any additional inductive bias.
arXiv Detail & Related papers (2023-02-28T06:38:05Z) - Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning [53.72897232951918]
We show that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning.
We suggest a new variant of prefix-tuning -- textitinducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning.
arXiv Detail & Related papers (2022-10-26T04:39:42Z) - Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural
Networks [47.52214243454995]
A popular strategy to train recurrent neural networks (RNNs) is to take the ground truth as input at each time step.
We propose a fully differentiable training algorithm for RNNs to better capture long-term dependencies.
arXiv Detail & Related papers (2021-03-22T06:15:22Z) - Pretrained Transformers as Universal Computation Engines [105.00539596788127]
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning.
We study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction.
We find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.
arXiv Detail & Related papers (2021-03-09T06:39:56Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.