Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
- URL: http://arxiv.org/abs/2409.16040v2
- Date: Wed, 2 Oct 2024 09:08:21 GMT
- Title: Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
- Authors: Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, Ming Jin,
- Abstract summary: Time-MoE is a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models.
Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction.
For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision.
- Score: 25.503695417712997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin. These advancements position Time-MoE as a state-of-the-art solution for tackling real-world time series forecasting challenges with superior capability, efficiency, and flexibility.
Related papers
- Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling [51.78972657142583]
We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K.<n>To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions.
arXiv Detail & Related papers (2026-03-05T04:13:57Z) - Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting [38.81240885985943]
We show that small hybrid models that interleave long convolution and linear RNN layers can match the performance of larger transformer-based models.<n>This recipe results in Reverso, a family of efficient time series foundation models for zero-shot forecasting.
arXiv Detail & Related papers (2026-02-19T18:48:08Z) - Tiny-TSM: Efficiently Training a Lightweight SOTA Time Series Foundation Model [0.0]
We present Tiny-TSM, a time series foundation model characterized by small scale, economical training, and state-of-the-art performance.<n>It comprises 23M total parameters, trained on a single A100 GPU in less than a week using a new synthetic data generation and data augmentation pipeline.
arXiv Detail & Related papers (2025-11-24T16:22:05Z) - SEMPO: Lightweight Foundation Models for Time Series Forecasting [45.456949943052116]
SEMPO is a lightweight foundation model that requires pretraining on relatively small-scale data, yet exhibits strong general time series forecasting.<n> SEMPO comprises two key modules: 1) energy-aware SpEctral decomposition module, that substantially improves the utilization of pre-training data.<n>Experiments on two large-scale benchmarks covering 16 datasets demonstrate the superior performance of SEMPO in both zero-shot and few-shot forecasting scenarios.
arXiv Detail & Related papers (2025-10-22T15:58:44Z) - SVTime: Small Time Series Forecasting Models Informed by "Physics" of Large Vision Model Forecasters [86.38433605933515]
Time series AI is crucial for analyzing dynamic web content.<n>Given their energy-intensive training, inference, and hardware demands, using large models as a one-fits-all solution raises serious concerns about carbon footprint and sustainability.<n>This paper introduces SVTime, a novel Small model inspired by large Vision model (LVM) forecasters for long-term Time series forecasting (LTSF)
arXiv Detail & Related papers (2025-10-10T18:42:23Z) - Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z) - Does Scaling Law Apply in Time Series Forecasting? [2.127584662240465]
We propose Alinear, an ultra-lightweight forecasting model that achieves competitive performance using only k-level parameters.<n>Experiments on seven benchmark datasets demonstrate that Alinear consistently outperforms large-scale models.<n>This work challenges the prevailing belief that larger models are inherently better and suggests a paradigm shift toward more efficient time series modeling.
arXiv Detail & Related papers (2025-05-15T11:04:39Z) - Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications [0.0]
Time series forecasting is essential for operational intelligence in the hospitality industry.
This study evaluates the performance of statistical, machine learning (ML), deep learning, and foundation models in forecasting hourly sales over a 14-day horizon.
arXiv Detail & Related papers (2025-02-05T17:30:31Z) - A Mamba Foundation Model for Time Series Forecasting [13.593170999506889]
We introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture.
The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy.
It also achieves competitive or superior full-shot performance compared to task-specific prediction models.
arXiv Detail & Related papers (2024-11-05T09:34:05Z) - Test Time Learning for Time Series Forecasting [1.4605709124065924]
Test-Time Training (TTT) modules consistently outperform state-of-the-art models, including the Mamba-based TimeMachine.
Our results show significant improvements in Mean Squared Error (MSE) and Mean Absolute Error (MAE)
This work sets a new benchmark for time-series forecasting and lays the groundwork for future research in scalable, high-performance forecasting models.
arXiv Detail & Related papers (2024-09-21T04:40:08Z) - Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling [55.13352174687475]
This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales.
Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale.
We introduce a lead time-aware training framework to promote the generalization of the model at different lead times.
arXiv Detail & Related papers (2024-05-22T16:21:02Z) - A Scalable and Transferable Time Series Prediction Framework for Demand
Forecasting [24.06534393565697]
Time series forecasting is one of the most essential and ubiquitous tasks in many business problems.
We propose Forecasting orchestra (Forchestra), a simple but powerful framework capable of accurately predicting future demand for a diverse range of items.
arXiv Detail & Related papers (2024-02-29T18:01:07Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Lag-Llama: Towards Foundation Models for Probabilistic Time Series
Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture.
Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities.
When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems.
We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting.
Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z) - Neural forecasting at scale [8.245069318446415]
We study the problem of efficiently scaling ensemble-based deep neural networks for time series (TS) forecasting on a large set of time series.
Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5.
arXiv Detail & Related papers (2021-09-20T17:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.