Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting
Transformer
- URL: http://arxiv.org/abs/2312.05792v1
- Date: Sun, 10 Dec 2023 06:50:56 GMT
- Title: Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting
Transformer
- Authors: Li Shen, Yuning Wei, Yangzhu Wang, Hongguang Li
- Abstract summary: We propose FPPformer to utilize bottom-up and top-down architectures in encoder and decoder to build the full and rational hierarchy.
Extensive experiments with six state-of-the-art benchmarks verify the promising performances of FPPformer.
- Score: 9.281993269355544
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the development of Internet of Things (IoT) systems, precise long-term
forecasting method is requisite for decision makers to evaluate current
statuses and formulate future policies. Currently, Transformer and MLP are two
paradigms for deep time-series forecasting and the former one is more
prevailing in virtue of its exquisite attention mechanism and encoder-decoder
architecture. However, data scientists seem to be more willing to dive into the
research of encoder, leaving decoder unconcerned. Some researchers even adopt
linear projections in lieu of the decoder to reduce the complexity. We argue
that both extracting the features of input sequence and seeking the relations
of input and prediction sequence, which are respective functions of encoder and
decoder, are of paramount significance. Motivated from the success of FPN in CV
field, we propose FPPformer to utilize bottom-up and top-down architectures
respectively in encoder and decoder to build the full and rational hierarchy.
The cutting-edge patch-wise attention is exploited and further developed with
the combination, whose format is also different in encoder and decoder, of
revamped element-wise attention in this work. Extensive experiments with six
state-of-the-art baselines on twelve benchmarks verify the promising
performances of FPPformer and the importance of elaborately devising decoder in
time-series forecasting Transformer. The source code is released in
https://github.com/OrigamiSL/FPPformer.
Related papers
- Transformer-based Video Saliency Prediction with High Temporal Dimension
Decoding [12.595019348741042]
We propose a transformer-based video saliency prediction approach with high temporal dimension network decoding (THTDNet)
This architecture yields comparable performance to multi-branch and over-complicated models on common benchmarks such as DHF1K, UCF-sports and Hollywood-2.
arXiv Detail & Related papers (2024-01-15T20:09:56Z) - DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
Transformer Models [22.276574156358084]
We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions.
We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines.
arXiv Detail & Related papers (2023-11-15T01:01:02Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Improving Position Encoding of Transformers for Multivariate Time Series
Classification [5.467400475482668]
We propose a new absolute position encoding method dedicated to time series data called time Absolute Position.
We then propose a novel time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data.
arXiv Detail & Related papers (2023-05-26T05:30:04Z) - Think Twice before Driving: Towards Scalable Decoders for End-to-End
Autonomous Driving [74.28510044056706]
Existing methods usually adopt the decoupled encoder-decoder paradigm.
In this work, we aim to alleviate the problem by two principles.
We first predict a coarse-grained future position and action based on the encoder features.
Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly.
arXiv Detail & Related papers (2023-05-10T15:22:02Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Sparsity and Sentence Structure in Encoder-Decoder Attention of
Summarization Systems [38.672160430296536]
Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization.
Previous work has focused on one important bottleneck, the quadratic self-attention mechanism in the encoder.
This work focuses on the transformer's encoder-decoder attention mechanism.
arXiv Detail & Related papers (2021-09-08T19:32:42Z) - On the Sub-Layer Functionalities of Transformer Decoder [74.83087937309266]
We study how Transformer-based decoders leverage information from the source and target languages.
Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance.
arXiv Detail & Related papers (2020-10-06T11:50:54Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.