Related papers: Improving Transformers using Faithful Positional Encoding

Improving Transformers using Faithful Positional Encoding

URL: http://arxiv.org/abs/2405.09061v2
Date: Thu, 16 May 2024 06:26:43 GMT
Title: Improving Transformers using Faithful Positional Encoding
Authors: Tsuyoshi Idé, Jokin Labaien, Pin-Yu Chen,
Abstract summary: We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
Score: 55.30212768657544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.

Related papers

Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary [1.4594704809280983]
Positional encoding is a high-dimensional representation of time indices on input data. RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary.
arXiv Detail & Related papers (2024-01-31T23:32:20Z)
Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed. Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z)
Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and Editability [76.6724135757723]
GAN inversion aims to invert an input image into the latent space of a pre-trained GAN. Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability. We propose a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code.
arXiv Detail & Related papers (2022-07-19T16:10:16Z)
Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers. Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z)
Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths. We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately. The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z)
Learnable Fourier Features for Multi-DimensionalSpatial Positional Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features. Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z)
Demystifying the Better Performance of Position Encoding Variants for Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models. The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z)
Context- and Sequence-Aware Convolutional Recurrent Encoder for Neural Machine Translation [2.729898906885749]
Existing models use recurrent neural networks to construct the encoder and decoder modules. In alternative research, the recurrent networks were substituted by convolutional neural networks for capturing the syntactic structure in the input sentence. We incorporate the goodness of both approaches by proposing a convolutional-recurrent encoder for capturing the context information.
arXiv Detail & Related papers (2021-01-11T17:03:52Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.