Related papers: Dynamic Relational Priming Improves Transformer in Multivariate Time Series

Dynamic Relational Priming Improves Transformer in Multivariate Time Series

URL: http://arxiv.org/abs/2509.12196v1
Date: Mon, 15 Sep 2025 17:56:15 GMT
Title: Dynamic Relational Priming Improves Transformer in Multivariate Time Series
Authors: Hunjae Lee, Corey Clark,
Abstract summary: We propose attention with dynamic relational priming (prime attention)<n>We show that prime attention consistently outperforms standard attention across benchmarks.<n>We also find that prime attention achieves comparable or superior performance using up to 40% less sequence length compared to standard attention.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations in each layer. This limits their representational alignment with the potentially diverse relational dynamics of each token-pair interaction. While they excel in domains with relatively homogeneous relationships, standard attention's static relational learning struggles to capture the diverse, heterogeneous inter-channel dependencies of multivariate time series (MTS) data--where different channel-pair interactions within a single system may be governed by entirely different physical laws or temporal dynamics. To better align the attention mechanism for such domain phenomena, we propose attention with dynamic relational priming (prime attention). Unlike standard attention where each token presents an identical representation across all of its pair-wise interactions, prime attention tailors each token dynamically (or per interaction) through learnable modulations to best capture the unique relational dynamics of each token pair, optimizing each pair-wise interaction for that specific relationship. This representational plasticity of prime attention enables effective extraction of relationship-specific information in MTS while maintaining the same asymptotic computational complexity as standard attention. Our results demonstrate that prime attention consistently outperforms standard attention across benchmarks, achieving up to 6.5\% improvement in forecasting accuracy. In addition, we find that prime attention achieves comparable or superior performance using up to 40\% less sequence length compared to standard attention, further demonstrating its superior relational modeling capabilities.

Related papers

Krause Synchronization Transformers [63.8469912831803]
Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer.<n>We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics.
arXiv Detail & Related papers (2026-02-12T03:47:53Z)
Task-Level Insights from Eigenvalues across Sequence Models [41.79939327722031]
We show that eigenvalues influence essential aspects of memory and long-range dependency modeling.<n>We then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance.<n>This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
arXiv Detail & Related papers (2025-10-10T13:35:21Z)
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions [94.21989689001848]
We propose (Delta)ConvFusion to replace conventional self-attention modules with Pyramid Convolution Blocks ((Delta)ConvBlocks)<n>By distilling attention patterns into localized convolutional operations while keeping other components frozen, (Delta)ConvFusion achieves performance comparable to transformer-based counterparts while reducing computational cost by 6929$times$ and surpassing LinFusion by 5.42$times$ in efficiency--all without compromising generative fidelity.
arXiv Detail & Related papers (2025-04-30T03:57:28Z)
Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM [0.0]
We propose a novel transformer architecture with two key innovations: inter-token relation enhancement and dynamic temperature tuning.<n>We validate our method on the REDD dataset and show that it outperforms the original transformer and state-of-the-art models by 10-15% in F1 score across various appliance types.
arXiv Detail & Related papers (2024-10-12T18:58:45Z)
TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z)
CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting [3.6814181034608664]
We propose a strategy of channel independence followed by mixing in time series analysis.<n>We introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism.<n>Our framework effectively incorporates sequence and channel adapters, significantly improving the model's ability to identify important information.
arXiv Detail & Related papers (2023-12-11T09:10:38Z)
Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z)
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation [64.85974098314344]
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video. Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images. We propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism.
arXiv Detail & Related papers (2023-09-23T02:40:28Z)
Learning Interaction Variables and Kernels from Observations of Agent-Based Systems [14.240266845551488]
We propose a learning technique that, given observations of states and velocities along trajectories of agents, yields both the variables upon which the interaction kernel depends and the interaction kernel itself. This yields an effective dimension reduction which avoids the curse of dimensionality from the high-dimensional observation data. We demonstrate the learning capability of our method to a variety of first-order interacting systems.
arXiv Detail & Related papers (2022-08-04T16:31:01Z)
Dynamic Relation Discovery and Utilization in Multi-Entity Time Series Forecasting [92.32415130188046]
In many real-world scenarios, there could exist crucial yet implicit relation between entities. We propose an attentional multi-graph neural network with automatic graph learning (A2GNN) in this work.
arXiv Detail & Related papers (2022-02-18T11:37:04Z)
Inference of time-ordered multibody interactions [0.8057006406834466]
We introduce time-ordered multibody interactions to describe complex systems manifesting temporal as well as multibody dependencies. We present an algorithm to extract those interactions from data capturing the system-level dynamics of node states. We experimentally validate the robustness of our algorithm against statistical errors and its efficiency at inferring parsimonious interaction ensembles.
arXiv Detail & Related papers (2021-11-29T15:41:06Z)
Relational Self-Attention: What's Missing in Attention for Video Understanding [52.38780998425556]
We introduce a relational feature transform, dubbed the relational self-attention (RSA) Our experiments and ablation studies show that the RSA network substantially outperforms convolution and self-attention counterparts.
arXiv Detail & Related papers (2021-11-02T15:36:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.