Transformer visualization via dictionary learning: contextualized
embedding as a linear superposition of transformer factors
- URL: http://arxiv.org/abs/2103.15949v2
- Date: Tue, 4 Apr 2023 06:43:19 GMT
- Title: Transformer visualization via dictionary learning: contextualized
embedding as a linear superposition of transformer factors
- Authors: Zeyu Yun, Yubei Chen, Bruno A Olshausen, Yann LeCun
- Abstract summary: We propose to use dictionary learning to open up "black boxes" as linear superpositions of transformer factors.
Through visualization, we demonstrate the hierarchical semantic structures captured by the transformer factors.
We hope this visualization tool can bring further knowledge and a better understanding of how transformer networks work.
- Score: 15.348047288817478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer networks have revolutionized NLP representation learning since
they were introduced. Though a great effort has been made to explain the
representation in transformers, it is widely recognized that our understanding
is not sufficient. One important reason is that there lack enough visualization
tools for detailed analysis. In this paper, we propose to use dictionary
learning to open up these "black boxes" as linear superpositions of transformer
factors. Through visualization, we demonstrate the hierarchical semantic
structures captured by the transformer factors, e.g., word-level polysemy
disambiguation, sentence-level pattern formation, and long-range dependency.
While some of these patterns confirm the conventional prior linguistic
knowledge, the rest are relatively unexpected, which may provide new insights.
We hope this visualization tool can bring further knowledge and a better
understanding of how transformer networks work. The code is available at
https://github.com/zeyuyun1/TransformerVis
Related papers
- The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers [0.0]
transformer neural network architecture allows for autoregressive sequence-to-sequence modeling.
Transformers have also been applied across a wide variety of pattern recognition tasks, particularly in computer vision.
arXiv Detail & Related papers (2024-06-24T16:45:28Z) - Transformers are Expressive, But Are They Expressive Enough for Regression? [38.369337945109855]
We show that Transformers struggle to reliably approximate smooth functions, relying on piecewise constant approximations with sizable intervals.
By shedding light on these challenges, we advocate a refined understanding of Transformers' capabilities.
arXiv Detail & Related papers (2024-02-23T18:12:53Z) - Interpretation of the Transformer and Improvement of the Extractor [3.9693969407364427]
It has been over six years since the Transformer architecture was put forward.
Surprisingly, the vanilla Transformer architecture is still widely used today.
The lack of deep understanding and comprehensive interpretation of the Transformer architecture makes it more challenging to improve the Transformer architecture.
arXiv Detail & Related papers (2023-11-21T15:36:20Z) - Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles [65.54857068975068]
In this paper, we argue that this additional bulk is unnecessary.
By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer.
We create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models.
arXiv Detail & Related papers (2023-06-01T17:59:58Z) - An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points.
In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z) - What Makes for Good Tokenizers in Vision Transformer? [62.44987486771936]
transformers are capable of extracting their pairwise relationships using self-attention.
What makes for a good tokenizer has not been well understood in computer vision.
Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization.
Regularization objective TokenProp is embraced in the standard training regime.
arXiv Detail & Related papers (2022-12-21T15:51:43Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - On the Power of Saturated Transformers: A View from Circuit Complexity [87.20342701232869]
We show that saturated transformers transcend the limitations of hard-attention transformers.
The jump from hard to saturated attention can be understood as increasing the transformer's effective circuit depth by a factor of $O(log n)$.
arXiv Detail & Related papers (2021-06-30T17:09:47Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Position Information in Transformers: An Overview [6.284464997330884]
This paper provides an overview of common methods to incorporate position information into Transformer models.
The objectives of this survey are to showcase that position information in Transformer is a vibrant and extensive research area.
arXiv Detail & Related papers (2021-02-22T15:03:23Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.