Related papers: ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies

ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies

URL: http://arxiv.org/abs/2507.13998v1
Date: Fri, 18 Jul 2025 15:08:02 GMT
Title: ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies
Authors: Itay Katav, Aryeh Kontorovich,
Abstract summary: In natural language processing, an approach has been used that combines local window attention for capturing short-term dependencies and Mamba for capturing long-term dependencies.<n>We find that for time-series forecasting tasks, assigning equal weight to long-term and short-term dependencies is not optimal.<n>We propose a dynamic weighting mechanism, ParallelTime Weighter, which calculates interdependent weights for long-term and short-term dependencies.
Score: 11.40258240052954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern multivariate time series forecasting primarily relies on two architectures: the Transformer with attention mechanism and Mamba. In natural language processing, an approach has been used that combines local window attention for capturing short-term dependencies and Mamba for capturing long-term dependencies, with their outputs averaged to assign equal weight to both. We find that for time-series forecasting tasks, assigning equal weight to long-term and short-term dependencies is not optimal. To mitigate this, we propose a dynamic weighting mechanism, ParallelTime Weighter, which calculates interdependent weights for long-term and short-term dependencies for each token based on the input and the model's knowledge. Furthermore, we introduce the ParallelTime architecture, which incorporates the ParallelTime Weighter mechanism to deliver state-of-the-art performance across diverse benchmarks. Our architecture demonstrates robustness, achieves lower FLOPs, requires fewer parameters, scales effectively to longer prediction horizons, and significantly outperforms existing methods. These advances highlight a promising path for future developments of parallel Attention-Mamba in time series forecasting. The implementation is readily available at: \href{https://github.com/itay1551/ParallelTime}{ParallelTime GitHub

Related papers

Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs [81.5049387116454]
We introduce APB, an efficient long-context inference framework.<n>APB uses multi-host approximate attention to enhance prefill speed.<n>APB achieves speeds of up to 9.2x, 4.2x, and 1.6x compared with FlashAttn, RingAttn, and StarAttn, respectively.
arXiv Detail & Related papers (2025-02-17T17:59:56Z)
Parallelized Autoregressive Visual Generation [65.9579525736345]
We propose a simple yet effective approach for parallelized autoregressive visual generation.<n>Our method achieves a 3.6x speedup with comparable quality and up to 9.5x speedup with minimal quality degradation across both image and video generation tasks.
arXiv Detail & Related papers (2024-12-19T17:59:54Z)
UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework. It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation. UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z)
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding [62.68430939686566]
We present ParallelSpec, an alternative to auto-regressive drafting strategies in state-of-the-art speculative decoding approaches. In contrast to auto-regressive drafting in the speculative stage, we train a parallel drafter to serve as an efficient speculative model.
arXiv Detail & Related papers (2024-10-08T01:05:08Z)
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics [7.745945701278489]
Long-short range time series forecasting is essential for predicting future trends and patterns over extended periods. Deep learning models such as Transformers have made significant strides in advancing time series forecasting. This article examines the advantages and disadvantages of both Mamba and Transformer models.
arXiv Detail & Related papers (2024-09-13T04:23:54Z)
Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need [28.301119776877822]
Time series forecasting requires balancing short-term and long-term dependencies for accurate predictions. Transformers are superior in modeling long-term dependencies but are criticized for their quadratic computational cost. Mamba provides a near-linear alternative but is reported less effective in time series longterm forecasting due to potential information loss.
arXiv Detail & Related papers (2024-08-28T17:59:27Z)
SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting [13.110156202816112]
TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales. TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets.
arXiv Detail & Related papers (2024-03-14T22:19:37Z)
Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs? [21.15722677855935]
Transformer-based models have achieved impressive performance on various time series tasks. Long-Term Series Forecasting (LTSF) tasks have also received extensive attention in recent years. Due to the inherent computational complexity and long sequences demanding of Transformer-based methods, its application on LTSF tasks still has two major issues.
arXiv Detail & Related papers (2023-06-08T08:37:49Z)
Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data. Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step. When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.