Introduction to Transformers: an NLP Perspective
- URL: http://arxiv.org/abs/2311.17633v1
- Date: Wed, 29 Nov 2023 13:51:04 GMT
- Title: Introduction to Transformers: an NLP Perspective
- Authors: Tong Xiao, Jingbo Zhu
- Abstract summary: We introduce basic concepts of Transformers and present key techniques that form the recent advances of these models.
This includes a description of the standard Transformer architecture, a series of model refinements, and common applications.
- Score: 59.0241868728732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have dominated empirical machine learning models of natural
language processing. In this paper, we introduce basic concepts of Transformers
and present key techniques that form the recent advances of these models. This
includes a description of the standard Transformer architecture, a series of
model refinements, and common applications. Given that Transformers and related
deep learning techniques might be evolving in ways we have never seen, we
cannot dive into all the model details or cover all the technical areas.
Instead, we focus on just those concepts that are helpful for gaining a good
understanding of Transformers and their variants. We also summarize the key
ideas that impact this field, thereby yielding some insights into the strengths
and limitations of these models.
Related papers
- Survey: Transformer-based Models in Data Modality Conversion [0.8136541584281987]
Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information.
This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications.
arXiv Detail & Related papers (2024-08-08T18:39:14Z) - Transformer in Touch: A Survey [29.622771021984594]
The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception.
This review aims to comprehensively outline the application and development of Transformers in tactile technology.
arXiv Detail & Related papers (2024-05-21T13:26:27Z) - An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points.
In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z) - Holistically Explainable Vision Transformers [136.27303006772294]
We propose B-cos transformers, which inherently provide holistic explanations for their decisions.
Specifically, we formulate each model component - such as the multi-layer perceptrons, attention layers, and the tokenisation module - to be dynamic linear.
We apply our proposed design to Vision Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are highly interpretable and perform competitively to baseline ViTs.
arXiv Detail & Related papers (2023-01-20T16:45:34Z) - A Survey on Transformers in Reinforcement Learning [66.23773284875843]
Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings.
Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL.
This paper systematically reviews motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.
arXiv Detail & Related papers (2023-01-08T14:04:26Z) - What Makes for Good Tokenizers in Vision Transformer? [62.44987486771936]
transformers are capable of extracting their pairwise relationships using self-attention.
What makes for a good tokenizer has not been well understood in computer vision.
Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization.
Regularization objective TokenProp is embraced in the standard training regime.
arXiv Detail & Related papers (2022-12-21T15:51:43Z) - Transformers from an Optimization Perspective [24.78739299952529]
We study the problem of finding an energy function underlying the Transformer model.
By finding such a function, we can reinterpret Transformers as the unfolding of an interpretable optimization process.
This work contributes to our intuition and understanding of Transformers, while potentially laying the ground-work for new model designs.
arXiv Detail & Related papers (2022-05-27T10:45:15Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z) - Modifying Memories in Transformer Models [71.48657481835767]
We propose a new task of emphexplicitly modifying specific factual knowledge in Transformer models.
This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models.
arXiv Detail & Related papers (2020-12-01T09:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.