Related papers: Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures

URL: http://arxiv.org/abs/2405.20799v1
Date: Fri, 31 May 2024 14:00:44 GMT
Title: Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures
Authors: Fernando Moreno-Pino, Álvaro Arroyo, Harrison Waldon, Xiaowen Dong, Álvaro Cartea,
Abstract summary: We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
Score: 46.58170057001437
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent architectures with Neural ODE-based models to account for irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of even moderate length. To address this challenge, we introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences and incurs significantly lower computational costs. In particular, we propose \textit{multi-view signature attention}, which uses path signatures to augment vanilla attention and to capture both local and global (multi-scale) dependencies in the input data, while remaining robust to changes in the sequence length and sampling frequency and yielding improved spatial processing. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the representational benefits of Neural ODE-based models, all at a fraction of the computational time and memory resources.

Related papers

Decomposition-based multi-scale transformer framework for time series anomaly detection [0.9438207505148947]
We propose a transformer-based framework built on decomposition (TransDe) for time series anomaly detection. A multi-scale patch-based transformer architecture is proposed to exploit the representative dependencies of each decomposed component of the time series. A novel asynchronous loss function with a stop-gradient strategy is introduced to enhance the performance of TransDe effectively.
arXiv Detail & Related papers (2025-04-19T06:47:38Z)
Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting [48.52101281458809]
Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. We propose Sentinel, a transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension. We introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture.
arXiv Detail & Related papers (2025-03-22T06:01:50Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies. We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis. Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
TSLANet: Rethinking Transformers for Time Series Representation Learning [19.795353886621715]
Time series data is characterized by its intrinsic long and short-range dependencies. We introduce a novel Time Series Lightweight Network (TSLANet) as a universal convolutional model for diverse time series tasks. Our experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2024-04-12T13:41:29Z)
Rough Transformers for Continuous and Efficient Time-Series Modelling [46.58170057001437]
Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. We introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences. We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models.
arXiv Detail & Related papers (2024-03-15T13:29:45Z)
EdgeConvFormer: Dynamic Graph CNN and Transformer based Anomaly Detection in Multivariate Time Series [7.514010315664322]
We propose a novel anomaly detection method, named EdgeConvFormer, which integrates stacked Time2vec embedding, dynamic graph CNN, and Transformer to extract global and local spatial-time information. Experiments demonstrate that EdgeConvFormer can learn the spatial-temporal modeling from multivariate time series data and achieve better anomaly detection performance than the state-of-the-art approaches on many real-world datasets of different scales.
arXiv Detail & Related papers (2023-12-04T08:38:54Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Robust representations of oil wells' intervals via sparse attention mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers) The focus in our experiments is on oil&gas data, namely, well logs. To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z)
ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer [0.0]
Original Transformer-based models adopt an attention mechanism to discover global information along a sequence. ChunkFormer splits the long sequences into smaller sequence chunks for the attention calculation. In this way, the proposed model gradually learns both local and global information without changing the total length of the input sequences.
arXiv Detail & Related papers (2021-12-30T15:06:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.