Improving Position Encoding of Transformers for Multivariate Time Series
Classification
- URL: http://arxiv.org/abs/2305.16642v1
- Date: Fri, 26 May 2023 05:30:04 GMT
- Title: Improving Position Encoding of Transformers for Multivariate Time Series
Classification
- Authors: Navid Mohammadi Foumani, Chang Wei Tan, Geoffrey I. Webb, Mahsa Salehi
- Abstract summary: We propose a new absolute position encoding method dedicated to time series data called time Absolute Position.
We then propose a novel time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data.
- Score: 5.467400475482668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have demonstrated outstanding performance in many applications
of deep learning. When applied to time series data, transformers require
effective position encoding to capture the ordering of the time series data.
The efficacy of position encoding in time series analysis is not well-studied
and remains controversial, e.g., whether it is better to inject absolute
position encoding or relative position encoding, or a combination of them. In
order to clarify this, we first review existing absolute and relative position
encoding methods when applied in time series classification. We then proposed a
new absolute position encoding method dedicated to time series data called time
Absolute Position Encoding (tAPE). Our new method incorporates the series
length and input embedding dimension in absolute position encoding.
Additionally, we propose computationally Efficient implementation of Relative
Position Encoding (eRPE) to improve generalisability for time series. We then
propose a novel multivariate time series classification (MTSC) model combining
tAPE/eRPE and convolution-based input encoding named ConvTran to improve the
position and data embedding of time series data. The proposed absolute and
relative position encoding methods are simple and efficient. They can be easily
integrated into transformer blocks and used for downstream tasks such as
forecasting, extrinsic regression, and anomaly detection. Extensive experiments
on 32 multivariate time-series datasets show that our model is significantly
more accurate than state-of-the-art convolution and transformer-based models.
Code and models are open-sourced at
\url{https://github.com/Navidfoumani/ConvTran}.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - DAPE: Data-Adaptive Positional Encoding for Length Extrapolation [60.18239094672938]
Positional encoding plays a crucial role in transformers, significantly impacting model performance and generalization length.
We propose a Data-Adaptive Positional (DAPE) method, which enhances model performances in terms of trained length and length generalization.
We successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods.
arXiv Detail & Related papers (2024-05-23T15:51:24Z) - Improving Transformers using Faithful Positional Encoding [55.30212768657544]
We propose a new positional encoding method for a neural network architecture called the Transformer.
Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
arXiv Detail & Related papers (2024-05-15T03:17:30Z) - Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting
Transformer [9.281993269355544]
We propose FPPformer to utilize bottom-up and top-down architectures in encoder and decoder to build the full and rational hierarchy.
Extensive experiments with six state-of-the-art benchmarks verify the promising performances of FPPformer.
arXiv Detail & Related papers (2023-12-10T06:50:56Z) - Functional Interpolation for Relative Positions Improves Long Context
Transformers [86.12843093589]
We propose a novel functional relative position encoding with progressive, FIRE, to improve Transformer generalization to longer contexts.
We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple.
We show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks.
arXiv Detail & Related papers (2023-10-06T17:59:11Z) - Attention Augmented Convolutional Transformer for Tabular Time-series [0.9137554315375922]
Time-series classification is one of the most frequently performed tasks in industrial data science.
We propose a novel scalable architecture for learning representations from time-series data.
Our proposed model is end-to-end and can handle both categorical and continuous valued inputs.
arXiv Detail & Related papers (2021-10-05T05:20:46Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z) - Learnable Fourier Features for Multi-DimensionalSpatial Positional
Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features.
Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z) - Demystifying the Better Performance of Position Encoding Variants for
Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models.
The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.