Related papers: Improving Position Encoding of Transformers for Multivariate Time Series Classification

Improving Position Encoding of Transformers for Multivariate Time Series Classification

URL: http://arxiv.org/abs/2305.16642v1
Date: Fri, 26 May 2023 05:30:04 GMT
Title: Improving Position Encoding of Transformers for Multivariate Time Series Classification
Authors: Navid Mohammadi Foumani, Chang Wei Tan, Geoffrey I. Webb, Mahsa Salehi
Abstract summary: We propose a new absolute position encoding method dedicated to time series data called time Absolute Position. We then propose a novel time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data.
Score: 5.467400475482668
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have demonstrated outstanding performance in many applications of deep learning. When applied to time series data, transformers require effective position encoding to capture the ordering of the time series data. The efficacy of position encoding in time series analysis is not well-studied and remains controversial, e.g., whether it is better to inject absolute position encoding or relative position encoding, or a combination of them. In order to clarify this, we first review existing absolute and relative position encoding methods when applied in time series classification. We then proposed a new absolute position encoding method dedicated to time series data called time Absolute Position Encoding (tAPE). Our new method incorporates the series length and input embedding dimension in absolute position encoding. Additionally, we propose computationally Efficient implementation of Relative Position Encoding (eRPE) to improve generalisability for time series. We then propose a novel multivariate time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data. The proposed absolute and relative position encoding methods are simple and efficient. They can be easily integrated into transformer blocks and used for downstream tasks such as forecasting, extrinsic regression, and anomaly detection. Extensive experiments on 32 multivariate time-series datasets show that our model is significantly more accurate than state-of-the-art convolution and transformer-based models. Code and models are open-sourced at \url{https://github.com/Navidfoumani/ConvTran}.

Related papers

Context-aware Biases for Length Extrapolation [0.0]
We propose an additive RPE, Context-Aware Biases for Length Extrapolation (CABLE)<n>By dynamically adjusting positional biases based on the input sequence, CABLE overcomes the rigidity of fixed RPEs.<n>Our method significantly enhances the performance of existing RPE methods tested on the FineWeb-Edu10B and WikiText-103 datasets.
arXiv Detail & Related papers (2025-03-11T05:54:58Z)
Positional Encoding in Transformer-Based Time Series Models: A Survey [2.8084422332394428]
positional encoding allows transformers to capture the intrinsic sequential nature of time series data. This survey systematically examines existing techniques for positional encoding in transformer-based time series models.
arXiv Detail & Related papers (2025-02-17T23:21:42Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation [60.18239094672938]
Positional encoding plays a crucial role in transformers, significantly impacting model performance and generalization length. We propose a Data-Adaptive Positional (DAPE) method, which enhances model performances in terms of trained length and length generalization. We successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods.
arXiv Detail & Related papers (2024-05-23T15:51:24Z)
Improving Transformers using Faithful Positional Encoding [55.30212768657544]
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
arXiv Detail & Related papers (2024-05-15T03:17:30Z)
Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting Transformer [9.281993269355544]
We propose FPPformer to utilize bottom-up and top-down architectures in encoder and decoder to build the full and rational hierarchy. Extensive experiments with six state-of-the-art benchmarks verify the promising performances of FPPformer.
arXiv Detail & Related papers (2023-12-10T06:50:56Z)
Functional Interpolation for Relative Positions Improves Long Context Transformers [86.12843093589]
We propose a novel functional relative position encoding with progressive, FIRE, to improve Transformer generalization to longer contexts. We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple. We show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks.
arXiv Detail & Related papers (2023-10-06T17:59:11Z)
Attention Augmented Convolutional Transformer for Tabular Time-series [0.9137554315375922]
Time-series classification is one of the most frequently performed tasks in industrial data science. We propose a novel scalable architecture for learning representations from time-series data. Our proposed model is end-to-end and can handle both categorical and continuous valued inputs.
arXiv Detail & Related papers (2021-10-05T05:20:46Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE) Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)
Learnable Fourier Features for Multi-DimensionalSpatial Positional Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features. Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z)
Demystifying the Better Performance of Position Encoding Variants for Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models. The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.