S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting
- URL: http://arxiv.org/abs/2502.11340v1
- Date: Mon, 17 Feb 2025 01:40:45 GMT
- Title: S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting
- Authors: Zihao Wu, Juncheng Dong, Haoming Yang, Vahid Tarokh,
- Abstract summary: Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns.<n>We propose State Space Transformer with cross-attention (S2TX) to address these concerns.<n>S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
- Score: 31.19126944008011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations are learned separately by two independent models without communication. In light of these concerns, we propose State Space Transformer with cross-attention (S2TX). S2TX employs a cross-attention mechanism to integrate a Mamba model for extracting long-range cross-variate context and a Transformer model with local window attention to capture short-range representations. By cross-attending to the global context, the Transformer model further facilitates variate-level interactions as well as local/global communications. Comprehensive experiments on seven classic long-short range time-series forecasting benchmark datasets demonstrate that S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
Related papers
- DC-Mamber: A Dual Channel Prediction Model based on Mamba and Linear Transformer for Multivariate Time Series Forecasting [6.238490256097465]
Current mainstream models are mostly based on Transformer and the emerging Mamba.<n>DC-Mamber is a dual-channel forecasting model based on Mamba and linear Transformer for time series forecasting.<n>Experiments on eight public datasets confirm DC-Mamber's superior accuracy over existing models.
arXiv Detail & Related papers (2025-07-06T12:58:52Z) - T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models [51.08566687549047]
We introduce Text-to-Series (T2S), a diffusion-based framework that bridges the gap between natural language and time series.<n>T2S employs a length-adaptive variational autoencoder to encode time series of varying lengths into consistent latent embeddings.<n>We train T2S in an interleaved paradigm across multiple lengths, allowing it to generate sequences of any desired length.
arXiv Detail & Related papers (2025-05-05T07:22:54Z) - Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations [2.2091590689610823]
We re-purpose the Transformer architecture to model both cross-time and cross-variate dependencies.
Our method achieves state-of-the-art performance across 13 real-world datasets, delivering performance improvements up to 20.7% over original models.
arXiv Detail & Related papers (2025-05-01T04:59:05Z) - TimeCNN: Refining Cross-Variable Interaction on Time Point for Time Series Forecasting [44.04862924157323]
Transformer-based models demonstrate significant potential in modeling cross-time and cross-variable interaction.
We propose a TimeCNN model to refine cross-variable interactions to enhance time series forecasting.
Extensive experiments conducted on 12 real-world datasets demonstrate that TimeCNN consistently outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-10-07T09:16:58Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.<n>Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - TiVaT: A Transformer with a Single Unified Mechanism for Capturing Asynchronous Dependencies in Multivariate Time Series Forecasting [4.733959271565453]
TiVaT is a novel architecture incorporating a single unified module, a Joint-Axis (JA) attention module.<n>The JA attention module dynamically selects relevant features to particularly capture asynchronous interactions.<n>Extensive experiments demonstrate TiVaT's overall performance across diverse datasets.
arXiv Detail & Related papers (2024-10-02T13:24:24Z) - sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting [6.434378359932152]
We review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data.
We propose $textbfsTransformer$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information.
We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T06:23:41Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification [13.110156202816112]
We propose a novel multi-view approach to capture patterns with properties like shift equivariance.
Our method integrates diverse features, including spectral, temporal, local, and global features, to obtain rich, complementary contexts for TSC.
Our approach achieves average accuracy improvements of 4.01-6.45% and 7.93% respectively, over leading TSC models.
arXiv Detail & Related papers (2024-06-06T18:05:10Z) - Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains.
We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions.
The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z) - TSLANet: Rethinking Transformers for Time Series Representation Learning [19.795353886621715]
Time series data is characterized by its intrinsic long and short-range dependencies.
We introduce a novel Time Series Lightweight Network (TSLANet) as a universal convolutional model for diverse time series tasks.
Our experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2024-04-12T13:41:29Z) - Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis.
Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation.
Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z) - EdgeConvFormer: Dynamic Graph CNN and Transformer based Anomaly
Detection in Multivariate Time Series [7.514010315664322]
We propose a novel anomaly detection method, named EdgeConvFormer, which integrates stacked Time2vec embedding, dynamic graph CNN, and Transformer to extract global and local spatial-time information.
Experiments demonstrate that EdgeConvFormer can learn the spatial-temporal modeling from multivariate time series data and achieve better anomaly detection performance than the state-of-the-art approaches on many real-world datasets of different scales.
arXiv Detail & Related papers (2023-12-04T08:38:54Z) - Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series
Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services.
Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels.
In our research, we introduce a fresh perspective on urban mobility prediction.
Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Long-Short Transformer: Efficient Transformers for Language and Vision [97.2850205384295]
Long-Short Transformer (Transformer-LS) is an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks.
It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations.
Our method outperforms the state-of-the-art models on multiple tasks in language and vision domains, including the Long Range Arena benchmark, autoregressive language modeling, and ImageNet classification.
arXiv Detail & Related papers (2021-07-05T18:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.