Infomaxformer: Maximum Entropy Transformer for Long Time-Series
Forecasting Problem
- URL: http://arxiv.org/abs/2301.01772v1
- Date: Wed, 4 Jan 2023 14:08:21 GMT
- Title: Infomaxformer: Maximum Entropy Transformer for Long Time-Series
Forecasting Problem
- Authors: Peiwang Tang and Xianchao Zhang
- Abstract summary: The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV)
With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem.
We propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition to capture more specific seasonal parts.
- Score: 6.497816402045097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Transformer architecture yields state-of-the-art results in many tasks
such as natural language processing (NLP) and computer vision (CV), since the
ability to efficiently capture the precise long-range dependency coupling
between input sequences. With this advanced capability, however, the quadratic
time complexity and high memory usage prevents the Transformer from dealing
with long time-series forecasting problem (LTFP). To address these
difficulties: (i) we revisit the learned attention patterns of the vanilla
self-attention, redesigned the calculation method of self-attention based the
Maximum Entropy Principle. (ii) we propose a new method to sparse the
self-attention, which can prevent the loss of more important self-attention
scores due to random sampling.(iii) We propose Keys/Values Distilling method
motivated that a large amount of feature in the original self-attention map is
redundant, which can further reduce the time and spatial complexity and make it
possible to input longer time-series. Finally, we propose a method that
combines the encoder-decoder architecture with seasonal-trend decomposition,
i.e., using the encoder-decoder architecture to capture more specific seasonal
parts. A large number of experiments on several large-scale datasets show that
our Infomaxformer is obviously superior to the existing methods. We expect this
to open up a new solution for Transformer to solve LTFP, and exploring the
ability of the Transformer architecture to capture much longer temporal
dependencies.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures [46.58170057001437]
We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences.
We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
arXiv Detail & Related papers (2024-05-31T14:00:44Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z) - Informer: Beyond Efficient Transformer for Long Sequence Time-Series
Forecasting [25.417560221400347]
Long sequence time-series forecasting (LSTF) demands a high prediction capacity.
Recent studies have shown the potential of Transformer to increase the prediction capacity.
We design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics.
arXiv Detail & Related papers (2020-12-14T11:43:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.