Transformers predicting the future. Applying attention in next-frame and
time series forecasting
- URL: http://arxiv.org/abs/2108.08224v1
- Date: Wed, 18 Aug 2021 16:17:29 GMT
- Title: Transformers predicting the future. Applying attention in next-frame and
time series forecasting
- Authors: Radostin Cholakov, Todor Kolev
- Abstract summary: Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences.
With the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms without any RNN can improve on the results in various sequence processing tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recurrent Neural Networks were, until recently, one of the best ways to
capture the timely dependencies in sequences. However, with the introduction of
the Transformer, it has been proven that an architecture with only
attention-mechanisms without any RNN can improve on the results in various
sequence processing tasks (e.g. NLP). Multiple studies since then have shown
that similar approaches can be applied for images, point clouds, video, audio
or time series forecasting. Furthermore, solutions such as the Perceiver or the
Informer have been introduced to expand on the applicability of the
Transformer. Our main objective is testing and evaluating the effectiveness of
applying Transformer-like models on time series data, tackling susceptibility
to anomalies, context awareness and space complexity by fine-tuning the
hyperparameters, preprocessing the data, applying dimensionality reduction or
convolutional encodings, etc. We are also looking at the problem of next-frame
prediction and exploring ways to modify existing solutions in order to achieve
higher performance and learn generalized knowledge.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - DRAformer: Differentially Reconstructed Attention Transformer for
Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting.
It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting.
However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z) - Universal Transformer Hawkes Process with Adaptive Recursive Iteration [4.624987488467739]
Asynchronous events sequences are widely distributed in the natural world and human activities, such as earthquakes records, users activities in social media and so on.
How to distill the information from these seemingly disorganized data is a persistent topic that researchers focus on.
The one of the most useful model is the point process model, and on the basis, the researchers obtain many noticeable results.
In recent years, point process models on the foundation of neural networks, especially recurrent neural networks (RNN) are proposed and compare with the traditional models, their performance are greatly improved.
arXiv Detail & Related papers (2021-12-29T09:55:12Z) - Temporal Attention Augmented Transformer Hawkes Process [4.624987488467739]
We come up with a new kind of Transformer-based Hawkes process model, Temporal Attention Augmented Transformer Hawkes Process (TAA-THP)
We modify the traditional dot-product attention structure, and introduce the temporal encoding into attention structure.
We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model.
arXiv Detail & Related papers (2021-12-29T09:45:23Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.