Temporal Attention Augmented Transformer Hawkes Process
- URL: http://arxiv.org/abs/2112.14472v1
- Date: Wed, 29 Dec 2021 09:45:23 GMT
- Title: Temporal Attention Augmented Transformer Hawkes Process
- Authors: Lu-ning Zhang, Jian-wei Liu, Zhi-yan Song, Xin Zuo
- Abstract summary: We come up with a new kind of Transformer-based Hawkes process model, Temporal Attention Augmented Transformer Hawkes Process (TAA-THP)
We modify the traditional dot-product attention structure, and introduce the temporal encoding into attention structure.
We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model.
- Score: 4.624987488467739
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, mining the knowledge from asynchronous sequences by Hawkes
process is a subject worthy of continued attention, and Hawkes processes based
on the neural network have gradually become the most hotly researched fields,
especially based on the recurrence neural network (RNN). However, these models
still contain some inherent shortcomings of RNN, such as vanishing and
exploding gradient and long-term dependency problems. Meanwhile, Transformer
based on self-attention has achieved great success in sequential modeling like
text processing and speech recognition. Although the Transformer Hawkes process
(THP) has gained huge performance improvement, THPs do not effectively utilize
the temporal information in the asynchronous events, for these asynchronous
sequences, the event occurrence instants are as important as the types of
events, while conventional THPs simply convert temporal information into
position encoding and add them as the input of transformer. With this in mind,
we come up with a new kind of Transformer-based Hawkes process model, Temporal
Attention Augmented Transformer Hawkes Process (TAA-THP), we modify the
traditional dot-product attention structure, and introduce the temporal
encoding into attention structure. We conduct numerous experiments on a wide
range of synthetic and real-life datasets to validate the performance of our
proposed TAA-THP model, significantly improvement compared with existing
baseline models on the different measurements is achieved, including
log-likelihood on the test dataset, and prediction accuracies of event types
and occurrence times. In addition, through the ablation studies, we vividly
demonstrate the merit of introducing additional temporal attention by comparing
the performance of the model with and without temporal attention.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - RoTHP: Rotary Position Embedding-based Transformer Hawkes Process [0.0]
Temporal Point Processes (TPPs) are commonly used for modeling asynchronous event sequences data.
We propose a new Rotary Position Embedding-based THP architecture in this paper.
arXiv Detail & Related papers (2024-05-11T10:59:09Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - ViTs for SITS: Vision Transformers for Satellite Image Time Series [52.012084080257544]
We introduce a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT)
TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder.
arXiv Detail & Related papers (2023-01-12T11:33:07Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Universal Transformer Hawkes Process with Adaptive Recursive Iteration [4.624987488467739]
Asynchronous events sequences are widely distributed in the natural world and human activities, such as earthquakes records, users activities in social media and so on.
How to distill the information from these seemingly disorganized data is a persistent topic that researchers focus on.
The one of the most useful model is the point process model, and on the basis, the researchers obtain many noticeable results.
In recent years, point process models on the foundation of neural networks, especially recurrent neural networks (RNN) are proposed and compare with the traditional models, their performance are greatly improved.
arXiv Detail & Related papers (2021-12-29T09:55:12Z) - Transformers predicting the future. Applying attention in next-frame and
time series forecasting [0.0]
Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences.
With the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms without any RNN can improve on the results in various sequence processing tasks.
arXiv Detail & Related papers (2021-08-18T16:17:29Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.