Correlated Attention in Transformers for Multivariate Time Series
- URL: http://arxiv.org/abs/2311.11959v1
- Date: Mon, 20 Nov 2023 17:35:44 GMT
- Title: Correlated Attention in Transformers for Multivariate Time Series
- Authors: Quang Minh Nguyen, Lam M. Nguyen, Subhro Das
- Abstract summary: We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers.
In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level.
This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
- Score: 22.542109523780333
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multivariate time series (MTS) analysis prevails in real-world applications
such as finance, climate science and healthcare. The various self-attention
mechanisms, the backbone of the state-of-the-art Transformer-based models,
efficiently discover the temporal dependencies, yet cannot well capture the
intricate cross-correlation between different features of MTS data, which
inherently stems from complex dynamical systems in practice. To this end, we
propose a novel correlated attention mechanism, which not only efficiently
captures feature-wise dependencies, but can also be seamlessly integrated
within the encoder blocks of existing well-known Transformers to gain
efficiency improvement. In particular, correlated attention operates across
feature channels to compute cross-covariance matrices between queries and keys
with different lag values, and selectively aggregate representations at the
sub-series level. This architecture facilitates automated discovery and
representation learning of not only instantaneous but also lagged
cross-correlations, while inherently capturing time series auto-correlation.
When combined with prevalent Transformer baselines, correlated attention
mechanism constitutes a better alternative for encoder-only architectures,
which are suitable for a wide range of tasks including imputation, anomaly
detection and classification. Extensive experiments on the aforementioned tasks
consistently underscore the advantages of correlated attention mechanism in
enhancing base Transformer models, and demonstrate our state-of-the-art results
in imputation, anomaly detection and classification.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly
Detection using Data Degradation Scheme [0.7216399430290167]
Anomaly detection task for time series, especially for unlabeled data, has been a challenging problem.
We address it by applying a suitable data degradation scheme to self-supervised model training.
Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context.
arXiv Detail & Related papers (2023-05-08T05:42:24Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - DRAformer: Differentially Reconstructed Attention Transformer for
Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting.
It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting.
However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - HIFI: Anomaly Detection for Multivariate Time Series with High-order
Feature Interactions [7.016615391171876]
HIFI builds multivariate feature interaction graph automatically and uses the graph convolutional neural network to achieve high-order feature interactions.
Experiments on three publicly available datasets demonstrate the superiority of our framework compared with state-of-the-art approaches.
arXiv Detail & Related papers (2021-06-11T04:57:03Z) - Learning Graph Structures with Transformer for Multivariate Time Series
Anomaly Detection in IoT [11.480824844205864]
This work proposed a novel framework, namely GTA, for multivariate time series anomaly detection by automatically learning a graph structure followed by the graph convolution.
We also devised a novel graph convolution named Influence propagation convolution to model the anomaly information flow between graph nodes.
The experiments on four public anomaly detection benchmarks further demonstrate our approach's superiority over other state-of-the-arts.
arXiv Detail & Related papers (2021-04-08T01:45:28Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.