Related papers: Correlated Attention in Transformers for Multivariate Time Series

Correlated Attention in Transformers for Multivariate Time Series

URL: http://arxiv.org/abs/2311.11959v1
Date: Mon, 20 Nov 2023 17:35:44 GMT
Title: Correlated Attention in Transformers for Multivariate Time Series
Authors: Quang Minh Nguyen, Lam M. Nguyen, Subhro Das
Abstract summary: We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
Score: 22.542109523780333
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multivariate time series (MTS) analysis prevails in real-world applications such as finance, climate science and healthcare. The various self-attention mechanisms, the backbone of the state-of-the-art Transformer-based models, efficiently discover the temporal dependencies, yet cannot well capture the intricate cross-correlation between different features of MTS data, which inherently stems from complex dynamical systems in practice. To this end, we propose a novel correlated attention mechanism, which not only efficiently captures feature-wise dependencies, but can also be seamlessly integrated within the encoder blocks of existing well-known Transformers to gain efficiency improvement. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation. When combined with prevalent Transformer baselines, correlated attention mechanism constitutes a better alternative for encoder-only architectures, which are suitable for a wide range of tasks including imputation, anomaly detection and classification. Extensive experiments on the aforementioned tasks consistently underscore the advantages of correlated attention mechanism in enhancing base Transformer models, and demonstrate our state-of-the-art results in imputation, anomaly detection and classification.

Related papers

Integrating Quantum-Classical Attention in Patch Transformers for Enhanced Time Series Forecasting [4.580983642743026]
QCAAPatchTF is a quantum attention network integrated with an advanced patch-based transformer. leveraging quantum superpositions, entanglement, and variational quantum eigensolver principles. QCAAPatchTF achieves state-of-the-art performance in both long-term and short-term forecasting, classification, and anomaly detection tasks.
arXiv Detail & Related papers (2025-03-31T17:23:36Z)
Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting [48.52101281458809]
Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. We propose Sentinel, a transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension. We introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture.
arXiv Detail & Related papers (2025-03-22T06:01:50Z)
Transformer-based Multivariate Time Series Anomaly Localization [5.554794295879246]
Space-Time Anomaly Score (STAS) is a new metric inspired by the connection between transformer latent representations and space-time statistical models. Statistical Feature Anomaly Score (SFAS) complements STAS by analyzing statistical features around anomalies, with their combination helping to reduce false alarms. Experiments on real world and synthetic datasets illustrate the model's superiority over state-of-the-art methods in both detection and localization tasks.
arXiv Detail & Related papers (2025-01-15T07:18:51Z)
Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM [0.0]
We propose a novel transformer architecture with two key innovations: inter-token relation enhancement and dynamic temperature tuning. We validate our method on the REDD dataset and show that it outperforms the original transformer and state-of-the-art models by 10-15% in F1 score across various appliance types.
arXiv Detail & Related papers (2024-10-12T18:58:45Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Associative Transformer [25.1364199879332]
We propose Associative Transformer (AiT) to enhance the association among sparsely attended input tokens. AiT uses a learnable explicit memory comprising specialized priors that guide bottleneck attentions to facilitate the extraction of diverse localized tokens.
arXiv Detail & Related papers (2023-09-22T13:37:10Z)
AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme [0.7216399430290167]
Anomaly detection task for time series, especially for unlabeled data, has been a challenging problem. We address it by applying a suitable data degradation scheme to self-supervised model training. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context.
arXiv Detail & Related papers (2023-05-08T05:42:24Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Robust representations of oil wells' intervals via sparse attention mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers) The focus in our experiments is on oil&gas data, namely, well logs. To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z)
DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting. It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting. However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
HIFI: Anomaly Detection for Multivariate Time Series with High-order Feature Interactions [7.016615391171876]
HIFI builds multivariate feature interaction graph automatically and uses the graph convolutional neural network to achieve high-order feature interactions. Experiments on three publicly available datasets demonstrate the superiority of our framework compared with state-of-the-art approaches.
arXiv Detail & Related papers (2021-06-11T04:57:03Z)
Learning Graph Structures with Transformer for Multivariate Time Series Anomaly Detection in IoT [11.480824844205864]
This work proposed a novel framework, namely GTA, for multivariate time series anomaly detection by automatically learning a graph structure followed by the graph convolution. We also devised a novel graph convolution named Influence propagation convolution to model the anomaly information flow between graph nodes. The experiments on four public anomaly detection benchmarks further demonstrate our approach's superiority over other state-of-the-arts.
arXiv Detail & Related papers (2021-04-08T01:45:28Z)
Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.