Related papers: Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting

Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting

URL: http://arxiv.org/abs/2602.17634v1
Date: Thu, 19 Feb 2026 18:48:08 GMT
Title: Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting
Authors: Xinghong Fu, Yanhong Li, Georgios Papaioannou, Yoon Kim,
Abstract summary: We show that small hybrid models that interleave long convolution and linear RNN layers can match the performance of larger transformer-based models.<n>This recipe results in Reverso, a family of efficient time series foundation models for zero-shot forecasting.
Score: 38.81240885985943
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning time series foundation models has been shown to be a promising approach for zero-shot time series forecasting across diverse time series domains. Insofar as scaling has been a critical driver of performance of foundation models in other modalities such as language and vision, much recent work on time series foundation modeling has focused on scaling. This has resulted in time series foundation models with hundreds of millions of parameters that are, while performant, inefficient and expensive to use in practice. This paper describes a simple recipe for learning efficient foundation models for zero-shot time series forecasting that are orders of magnitude smaller. We show that large-scale transformers are not necessary: small hybrid models that interleave long convolution and linear RNN layers (in particular DeltaNet layers) can match the performance of larger transformer-based models while being more than a hundred times smaller. We also describe several data augmentation and inference strategies that further improve performance. This recipe results in Reverso, a family of efficient time series foundation models for zero-shot forecasting that significantly push the performance-efficiency Pareto frontier.

Related papers

How Foundational are Foundation Models for Time Series Forecasting? [2.692427265051276]
We argue that the inherent diversity of time series data makes foundation models less suited for building effective models.<n>We show that the zero-shot capabilities of a time series foundation model are significantly influenced and tied to the specific domains it has been pretrained on.
arXiv Detail & Related papers (2025-10-01T10:25:43Z)
Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z)
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts [25.503695417712997]
Time-MoE is a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models.<n>Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction.<n>For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision.
arXiv Detail & Related papers (2024-09-24T12:42:18Z)
LETS-C: Leveraging Text Embedding for Time Series Classification [15.520883566827608]
We propose an alternative approach to leveraging the success of language modeling in the time series domain.<n>We utilize a text embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP)<n>Our findings suggest that leveraging text embedding models to encode time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification.
arXiv Detail & Related papers (2024-07-09T04:07:57Z)
Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models. We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities. When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.