Related papers: Linearized Relative Positional Encoding

Linearized Relative Positional Encoding

URL: http://arxiv.org/abs/2307.09270v1
Date: Tue, 18 Jul 2023 13:56:43 GMT
Title: Linearized Relative Positional Encoding
Authors: Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong
Abstract summary: Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. We put together a variety of existing linear relative positional encoding approaches under a canonical form. We further propose a family of linear relative positional encoding algorithms via unitary transformation.
Score: 43.898057545832366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.

Related papers

Improving Transformers using Faithful Positional Encoding [55.30212768657544]
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
arXiv Detail & Related papers (2024-05-15T03:17:30Z)
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models [0.0]
Polynomial Based Positional gonal (PoPE) encodes positional information by Orthogonal Legendres. We show that transformer models PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task. We will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.
arXiv Detail & Related papers (2024-04-29T10:30:59Z)
Improving Position Encoding of Transformers for Multivariate Time Series Classification [5.467400475482668]
We propose a new absolute position encoding method dedicated to time series data called time Absolute Position. We then propose a novel time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data.
arXiv Detail & Related papers (2023-05-26T05:30:04Z)
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners) They incorporate a wide range of relative positional encoding mechanisms (RPEs) These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z)
Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths. We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately. The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z)
Learnable Fourier Features for Multi-DimensionalSpatial Positional Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features. Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z)
Relative Positional Encoding for Transformers with Linear Complexity [30.48367640796256]
relative positional encoding (RPE) was proposed as beneficial for classical Transformers. RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix. In this paper, we present precisely what is precisely what is a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE.
arXiv Detail & Related papers (2021-05-18T09:52:32Z)
Demystifying the Better Performance of Position Encoding Variants for Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models. The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z)
Learning to Encode Position for Transformer with Continuous Dynamical Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.