From Hawkes Processes to Attention: Time-Modulated Mechanisms for Event Sequences
- URL: http://arxiv.org/abs/2601.09220v1
- Date: Wed, 14 Jan 2026 06:47:37 GMT
- Title: From Hawkes Processes to Attention: Time-Modulated Mechanisms for Event Sequences
- Authors: Xinzi Tan, Kejian Zhang, Junhan Yu, Doudou Zhou,
- Abstract summary: Marked Temporal Point Processes (MTPPs) arise naturally in medical, social, commercial, and financial domains.<n>We propose a novel attention operator called Hawkes Attention, using learnable per-type neural kernels to modulate query, key and value projections.<n>In addition to the general MTPP, our attention mechanism can also be easily applied to specific temporal structures, such as time series forecasting.
- Score: 2.909892241405689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Marked Temporal Point Processes (MTPPs) arise naturally in medical, social, commercial, and financial domains. However, existing Transformer-based methods mostly inject temporal information only via positional encodings, relying on shared or parametric decay structures, which limits their ability to capture heterogeneous and type-specific temporal effects. Inspired by this observation, we derive a novel attention operator called Hawkes Attention from the multivariate Hawkes process theory for MTPP, using learnable per-type neural kernels to modulate query, key and value projections, thereby replacing the corresponding parts in the traditional attention. Benefited from the design, Hawkes Attention unifies event timing and content interaction, learning both the time-relevant behavior and type-specific excitation patterns from the data. The experimental results show that our method achieves better performance compared to the baselines. In addition to the general MTPP, our attention mechanism can also be easily applied to specific temporal structures, such as time series forecasting.
Related papers
- MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z) - Temporal Graph Pattern Machine [17.352525018007473]
Temporal Graph Pattern Machine (TGPM) conceptualizes each interaction as an interaction patch synthesized via temporally-biased random walks.<n>TGPM consistently achieves state-of-the-art performance in both transductive and inductive link prediction.
arXiv Detail & Related papers (2026-01-30T01:46:13Z) - Hyper Hawkes Processes: Interpretable Models of Marked Temporal Point Processes [12.72697616342555]
We present a new family MTPP models: the hyper Hawkes process (HHP)<n>HHP aims to be as flexible and performant as neural MTPPs, while retaining interpretable aspects.<n>These extensions define a highly performant MTPP family, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-11-02T22:10:08Z) - TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting [18.890651211582256]
We develop a novel Transformer architecture designed for time series data, aiming to maximize its representational capacity.<n>We identify two key but often overlooked characteristics of time series: (1) unidirectional influence from the past to the future, and (2) the phenomenon of decaying influence over time.<n>We propose TimeFormer, whose core innovation is a self-attention mechanism with two modulation terms (MoSA), designed to capture these temporal priors of time series.
arXiv Detail & Related papers (2025-10-08T06:07:30Z) - Multivariate Long-term Time Series Forecasting with Fourier Neural Filter [42.60778405812048]
We introduce FNF as the backbone and DBD as architecture to provide excellent learning capabilities and optimal learning pathways for spatial-temporal modeling.<n>We show that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling.
arXiv Detail & Related papers (2025-06-10T18:40:20Z) - DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision.
The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z) - RoTHP: Rotary Position Embedding-based Transformer Hawkes Process [0.0]
Temporal Point Processes (TPPs) are commonly used for modeling asynchronous event sequences data.
We propose a new Rotary Position Embedding-based THP architecture in this paper.
arXiv Detail & Related papers (2024-05-11T10:59:09Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Sequential Attention Source Identification Based on Feature
Representation [88.05527934953311]
This paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea.
It's worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge.
arXiv Detail & Related papers (2023-06-28T03:00:28Z) - ViTs for SITS: Vision Transformers for Satellite Image Time Series [52.012084080257544]
We introduce a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT)
TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder.
arXiv Detail & Related papers (2023-01-12T11:33:07Z) - Temporal Attention Augmented Transformer Hawkes Process [4.624987488467739]
We come up with a new kind of Transformer-based Hawkes process model, Temporal Attention Augmented Transformer Hawkes Process (TAA-THP)
We modify the traditional dot-product attention structure, and introduce the temporal encoding into attention structure.
We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model.
arXiv Detail & Related papers (2021-12-29T09:45:23Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.