Related papers: Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

URL: http://arxiv.org/abs/2410.10469v1
Date: Mon, 14 Oct 2024 13:01:11 GMT
Title: Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Authors: Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, Doyen Sahoo,
Abstract summary: This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts. Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
Score: 103.725112190618
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series foundation models have demonstrated impressive performance as zero-shot forecasters. However, achieving effectively unified training on time series remains an open challenge. Existing approaches introduce some level of model specialization to account for the highly heterogeneous nature of time series data. For instance, Moirai pursues unified training by employing multiple input/output projection layers, each tailored to handle time series at a specific frequency. Similarly, TimesFM maintains a frequency embedding dictionary for this purpose. We identify two major drawbacks to this human-imposed frequency-level model specialization: (1) Frequency is not a reliable indicator of the underlying patterns in time series. For example, time series with different frequencies can display similar patterns, while those with the same frequency may exhibit varied patterns. (2) Non-stationarity is an inherent property of real-world time series, leading to varied distributions even within a short context window of a single time series. Frequency-level specialization is too coarse-grained to capture this level of diversity. To address these limitations, this paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. With these designs, Moirai-MoE reduces reliance on human-defined heuristics and enables automatic token-level specialization. Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios. Furthermore, this study conducts comprehensive model analyses to explore the inner workings of time series MoE foundation models and provides valuable insights for future research.

Related papers

A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling [0.0]
We propose a first step to fill a gap by leveraging implicit neural representations (INRs)<n>MoTM combines a basis of INRs, each trained independently on a distinct family of time series, with a ridge regressor that adapts to the observed context at inference.<n>We demonstrate robust in-domain and out-of-domain generalization across diverse imputation scenarios.
arXiv Detail & Related papers (2025-07-17T15:16:30Z)
Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting [64.45587649141842]
Time-series forecasting plays a critical role in many real-world applications.<n>No single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases.<n>We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models.
arXiv Detail & Related papers (2025-05-24T00:45:07Z)
Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines [5.543238821368548]
Time series often exhibit significant diversity in their temporal patterns across different time spans and domains.<n>Time Tracker achieves state-of-the-art performance in predicting accuracy, model generalization and adaptability.
arXiv Detail & Related papers (2025-05-21T06:18:41Z)
TimesBERT: A BERT-Style Foundation Model for Time Series Understanding [72.64824086839631]
GPT-style models have been positioned as foundation models for time series forecasting. BERT-style architecture has not been fully unlocked for time series understanding. We design TimesBERT to learn generic representations of time series. Our model is pre-trained on 260 billion time points across diverse domains.
arXiv Detail & Related papers (2025-02-28T17:14:44Z)
General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data [61.163542597764796]
We show that time series with different time granularities (or corresponding frequency resolutions) exhibit distinct joint distributions in the frequency domain. A novel Fourier knowledge attention mechanism is proposed to enable learning time-aware representations from both the temporal and frequency domains. An autoregressive blank infilling pre-training framework is incorporated to time series analysis for the first time, leading to a generative tasks agnostic pre-training strategy.
arXiv Detail & Related papers (2025-02-05T15:20:04Z)
Generalized Prompt Tuning: Adapting Frozen Univariate Time Series Foundation Models for Multivariate Healthcare Time Series [3.9599054392856483]
Time series foundation models are pre-trained on large datasets and are able to achieve state-of-the-art performance in diverse tasks. We propose a prompt-tuning-inspired fine-tuning technique, Gen-P-Tuning, that enables us to adapt an existing univariate time series foundation model. We demonstrate the effectiveness of our fine-tuning approach against various baselines on two MIMIC classification tasks, and on influenza-like illness forecasting.
arXiv Detail & Related papers (2024-11-19T19:20:58Z)
DAM: Towards A Foundation Model for Time Series Forecasting [0.8231118867997028]
We propose a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output; and (3) the basis coefficients of a continuous function of time.
arXiv Detail & Related papers (2024-07-25T08:48:07Z)
Deep Time Series Models: A Comprehensive Survey and Benchmark [74.28364194333447]
Time series data is of great significance in real-world scenarios. Recent years have witnessed remarkable breakthroughs in the time series community. We release Time Series Library (TSLib) as a fair benchmark of deep time series models for diverse analysis tasks.
arXiv Detail & Related papers (2024-07-18T08:31:55Z)
MOMENT: A Family of Open Time-series Foundation Models [19.0845213853369]
We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. We compile a collection of public time series, called the Time series Pile, and systematically tackle time series-specific challenges. We build on recent work to design a benchmark to evaluate time series foundation models on diverse tasks and datasets in limited supervision settings.
arXiv Detail & Related papers (2024-02-06T10:48:46Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
Model-Attentive Ensemble Learning for Sequence Modeling [86.4785354333566]
We present Model-Attentive Ensemble learning for Sequence modeling (MAES) MAES is a mixture of time-series experts which leverages an attention-based gating mechanism to specialize the experts on different sequence dynamics and adaptively weight their predictions. We demonstrate that MAES significantly out-performs popular sequence models on datasets subject to temporal shift.
arXiv Detail & Related papers (2021-02-23T05:23:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.