Analyzing Transformers in Embedding Space
- URL: http://arxiv.org/abs/2209.02535v3
- Date: Sun, 24 Dec 2023 23:11:19 GMT
- Title: Analyzing Transformers in Embedding Space
- Authors: Guy Dar, Mor Geva, Ankit Gupta, Jonathan Berant
- Abstract summary: We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space.
We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space.
Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
- Score: 59.434807802802105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding Transformer-based models has attracted significant attention,
as they lie at the heart of recent technological advances across machine
learning. While most interpretability methods rely on running models over
inputs, recent work has shown that a zero-pass approach, where parameters are
interpreted directly without a forward/backward pass is feasible for some
Transformer parameters, and for two-layer attention networks. In this work, we
present a theoretical analysis where all parameters of a trained Transformer
are interpreted by projecting them into the embedding space, that is, the space
of vocabulary items they operate on. We derive a simple theoretical framework
to support our arguments and provide ample evidence for its validity. First, an
empirical analysis showing that parameters of both pretrained and fine-tuned
models can be interpreted in embedding space. Second, we present two
applications of our framework: (a) aligning the parameters of different models
that share a vocabulary, and (b) constructing a classifier without training by
``translating'' the parameters of a fine-tuned classifier to parameters of a
different model that was only pretrained. Overall, our findings open the door
to interpretation methods that, at least in part, abstract away from model
specifics and operate in the embedding space only.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - All Roads Lead to Rome? Exploring the Invariance of Transformers'
Representations [69.3461199976959]
We propose a model based on invertible neural networks, BERT-INN, to learn the Bijection Hypothesis.
We show the advantage of BERT-INN both theoretically and through extensive experiments.
arXiv Detail & Related papers (2023-05-23T22:30:43Z) - On the Role of Bidirectionality in Language Model Pre-Training [85.14614350372004]
We study the role of bidirectionality in next token prediction, text infilling, zero-shot priming and fine-tuning.
We train models with up to 6.7B parameters, and find differences to remain consistent at scale.
arXiv Detail & Related papers (2022-05-24T02:25:05Z) - Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP.
Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance.
We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z) - On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders.
Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.