Transformer Feed-Forward Layers Build Predictions by Promoting Concepts
in the Vocabulary Space
- URL: http://arxiv.org/abs/2203.14680v1
- Date: Mon, 28 Mar 2022 12:26:00 GMT
- Title: Transformer Feed-Forward Layers Build Predictions by Promoting Concepts
in the Vocabulary Space
- Authors: Mor Geva, Avi Caciularu, Kevin Ro Wang, Yoav Goldberg
- Abstract summary: Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood.
We make a substantial step towards unveiling this underlying prediction process, by reverse-engineering the operation of the feed-forward network (FFN) layers.
- Score: 49.029910567673824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models (LMs) are at the core of modern NLP, but
their internal prediction construction process is opaque and largely not
understood. In this work, we make a substantial step towards unveiling this
underlying prediction process, by reverse-engineering the operation of the
feed-forward network (FFN) layers, one of the building blocks of transformer
models. We view the token representation as a changing distribution over the
vocabulary, and the output from each FFN layer as an additive update to that
distribution. Then, we analyze the FFN updates in the vocabulary space, showing
that each update can be decomposed to sub-updates corresponding to single FFN
parameter vectors, each promoting concepts that are often human-interpretable.
We then leverage these findings for controlling LM predictions, where we reduce
the toxicity of GPT2 by almost 50%, and for improving computation efficiency
with a simple early exit rule, saving 20% of computation on average.
Related papers
- Partially Rewriting a Transformer in Natural Language [0.7234862895932991]
We attempt to partially rewrite a large language model using simple natural language explanations.
We replace the first layer of this sparse with an LLM-based simulator, which predicts the activation of each neuron.
We measure the degree to which these modifications distort the model's final output.
arXiv Detail & Related papers (2025-01-31T01:12:50Z) - Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts [33.677249894085186]
We explore the potential of Concept Bottleneck Models for transforming complex, non-interpretable foundation models into interpretable decision-making pipelines.
Specifically, we focus on the test-time deployment of such an interpretable CBM pipeline "in the wild"
Our adaptation method produces concept-based interpretations better aligned with the test data and boosts post-deployment accuracy by up to 28%.
arXiv Detail & Related papers (2024-12-18T17:47:46Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Jump to Conclusions: Short-Cutting Transformers With Linear Transformations [60.37563766047492]
Transformer-based language models create hidden representations of their inputs at every layer, but only use final-layer representations for prediction.
This obscures the internal decision-making process of the model and the utility of its intermediate representations.
We suggest a simple method for such casting, using linear transformations.
arXiv Detail & Related papers (2023-03-16T16:10:16Z) - Towards Opening the Black Box of Neural Machine Translation: Source and
Target Interpretations of the Transformer [1.8594711725515678]
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix.
Previous work on interpretability in NMT has focused solely on source sentence tokens attributions.
We propose an interpretability method that tracks complete input token attributions.
arXiv Detail & Related papers (2022-05-23T20:59:14Z) - Transkimmer: Transformer Learns to Layer-wise Skim [17.188613474427054]
One of the major computational inefficiency of Transformer-based models is that they spend identical amount of computation throughout all layers.
We propose Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer.
The skimmed tokens are then forwarded directly to the final output, thus reducing the computation of the successive layers.
arXiv Detail & Related papers (2022-05-15T16:23:30Z) - Consistent Accelerated Inference via Confident Adaptive Transformers [29.034390810078172]
We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers.
We simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence.
We demonstrate the effectiveness of this approach on four classification and regression tasks.
arXiv Detail & Related papers (2021-04-18T10:22:28Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one.
With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.