Related papers: Accelerating Time Series Foundation Models with Speculative Decoding

Accelerating Time Series Foundation Models with Speculative Decoding

URL: http://arxiv.org/abs/2511.18191v1
Date: Sat, 22 Nov 2025 21:04:57 GMT
Title: Accelerating Time Series Foundation Models with Speculative Decoding
Authors: Pranav Subbaraman, Fang Sun, Yue Yao, Huacong Tang, Xiao Luo, Yizhou Sun,
Abstract summary: Large-scale Transformer-based models have achieved state-of-the-art performance in time-series forecasting but suffer from high computational costs.<n>We propose a general inference acceleration framework that adapts speculative decoding to autoregressive time-series models.<n>Our approach employs a smaller "draft" model to propose future time-series patches, which are then verified in parallel by a larger "target" model.
Score: 46.99742287518152
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern web applications--from real-time content recommendation and dynamic pricing to CDN optimization--increasingly rely on time-series forecasting to deliver personalized experiences to billions of users. Large-scale Transformer-based models have achieved state-of-the-art performance in time-series forecasting but suffer from high computational costs, limiting their deployment in latency-sensitive web applications. To address this challenge, we propose a general inference acceleration framework that adapts speculative decoding to autoregressive time-series models. Our approach employs a smaller "draft" model to propose future time-series patches, which are then verified in parallel by a larger "target" model, reducing the number of sequential forward passes required. We address key technical challenges in adapting this technique from discrete language tokens to continuous time-series distributions, including the design of acceptance criteria for multivariate Gaussian patches and practical variants that balance efficiency with accuracy. Through experiments on time series forecasting benchmarks relevant to web applications, we demonstrate significant inference speedups while maintaining competitive accuracy. The framework requires no architectural modifications to existing foundation models, making it immediately applicable to accelerate deployed time-series forecasting systems. Our implementation can be found at https://github.com/PranavSubbaraman/STRIDE

Related papers

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios [76.85739138203014]
We present SpecFormer, a novel architecture that accelerates unidirectional and attention mechanisms.<n>We demonstrate that SpecFormer achieves lower training demands and reduced computational costs.
arXiv Detail & Related papers (2025-11-25T14:20:08Z)
A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting [6.312575071507716]
We present KAIROS, a non-autoregressive time series forecasting framework.<n>Unlike autoregressive approaches, KAIROS avoids error accumulation and achieves just-in-time inference.
arXiv Detail & Related papers (2025-10-02T14:50:50Z)
A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting [0.07916635054977067]
Pruning is an established approach to reduce neural network parameter count and save compute.<n>We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time.<n>We demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.
arXiv Detail & Related papers (2024-12-17T13:07:31Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations [15.797295258800638]
We introduce a novel modeling approach for time series imputation and forecasting, tailored to address the challenges often encountered in real-world data. Our method relies on a continuous-time-dependent model of the series' evolution dynamics. A modulation mechanism, driven by a meta-learning algorithm, allows adaptation to unseen samples and extrapolation beyond observed time-windows.
arXiv Detail & Related papers (2023-06-09T13:20:04Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting [0.7734726150561088]
Time series forecasting is an important problem, with many real world applications. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model.
arXiv Detail & Related papers (2022-08-30T13:04:48Z)
Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System [46.39662315849883]
We introduce CLMFormer, a novel framework that mitigates redundancy through curriculum learning and a memory-driven decoder.<n>CLMFormer consistently improves Transformer-based models by up to 30%, demonstrating its effectiveness in long-horizon forecasting.
arXiv Detail & Related papers (2022-07-16T04:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.