Related papers: TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting

TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting

URL: http://arxiv.org/abs/2403.09898v2
Date: Thu, 22 Aug 2024 23:49:16 GMT
Title: TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting
Authors: Md Atik Ahamed, Qiang Cheng,
Abstract summary: TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales. TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets.
Score: 13.110156202816112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footprints. TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales and leverage an innovative integrated quadruple-Mamba architecture to unify the handling of channel-mixing and channel-independence situations, thus enabling effective selection of contents for prediction against global and local contexts at different scales. Experimentally, TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets. Code availability: https://github.com/Atik-Ahamed/TimeMachine

Related papers

Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic [72.97800570813175]
We propose Timely Machine, redefining test-time as wall-clock time.<n>We introduce Timely-Eval, a benchmark spanning high-frequency tool calls, low-frequency tool calls, and time-constrained reasoning.<n>We find smaller models excel with fast feedback through more interactions, while larger models dominate high-latency settings via superior interaction quality.
arXiv Detail & Related papers (2026-01-23T06:28:52Z)
ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies [11.40258240052954]
In natural language processing, an approach has been used that combines local window attention for capturing short-term dependencies and Mamba for capturing long-term dependencies.<n>We find that for time-series forecasting tasks, assigning equal weight to long-term and short-term dependencies is not optimal.<n>We propose a dynamic weighting mechanism, ParallelTime Weighter, which calculates interdependent weights for long-term and short-term dependencies.
arXiv Detail & Related papers (2025-07-18T15:08:02Z)
How Much Can Time-related Features Enhance Time Series Forecasting? [27.030553080458716]
We introduce a module designed to encode time-related features, Time Stamp Forecaster (TimeSter) TimeSter significantly improves the performance of a single linear projector, reducing MSE by an average of 23% on benchmark datasets such as Electricity and Traffic.
arXiv Detail & Related papers (2024-12-02T14:45:26Z)
UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework. It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation. UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters [6.733646592789575]
Long-term Time Series Forecasting (LTSF) involves predicting long-term values by analyzing a large amount of historical time-series data to identify patterns and trends. Transformer-based models offer high forecasting accuracy, but they are often too compute-intensive to be deployed on devices with hardware constraints. We propose MixLinear, an ultra-lightweight time series forecasting model specifically designed for resource-constrained devices.
arXiv Detail & Related papers (2024-10-02T23:04:57Z)
Test Time Learning for Time Series Forecasting [1.4605709124065924]
Test-Time Training (TTT) modules consistently outperform state-of-the-art models, including the Mamba-based TimeMachine. Our results show significant improvements in Mean Squared Error (MSE) and Mean Absolute Error (MAE) This work sets a new benchmark for time-series forecasting and lays the groundwork for future research in scalable, high-performance forecasting models.
arXiv Detail & Related papers (2024-09-21T04:40:08Z)
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics [7.745945701278489]
Long-short range time series forecasting is essential for predicting future trends and patterns over extended periods. Deep learning models such as Transformers have made significant strides in advancing time series forecasting. This article examines the advantages and disadvantages of both Mamba and Transformer models.
arXiv Detail & Related papers (2024-09-13T04:23:54Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time. We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA) Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z)
Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data. Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step. When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)
Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies. THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.