Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting
- URL: http://arxiv.org/abs/2509.15105v1
- Date: Thu, 18 Sep 2025 16:11:31 GMT
- Title: Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting
- Authors: Liran Nochumsohn, Raz Marshanski, Hedi Zisling, Omri Azencot,
- Abstract summary: We introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting.<n>It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes.<n>It offers superior efficiency, robustness to various sampling rates, and enhanced interpretability.
- Score: 8.668012341094494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series forecasting (TSF) is critical in domains like energy, finance, healthcare, and logistics, requiring models that generalize across diverse datasets. Large pre-trained models such as Chronos and Time-MoE show strong zero-shot (ZS) performance but suffer from high computational costs. In this work, We introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. Despite its simplicity, Super-Linear matches state-of-the-art performance while offering superior efficiency, robustness to various sampling rates, and enhanced interpretability. The implementation of Super-Linear is available at \href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}
Related papers
- FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts [49.9321870703948]
Existing models predominantly focus on short-horizon predictions and suffer from notorious computational costs and memory consumption.<n>We present FaST, an effective and efficient framework based on Mixture-of-Experts (MoEs) for long-horizon and large-scale STG forecasting.<n>FaST is underpinned by two key innovations. First, an adaptive graph agent attention mechanism is proposed to alleviate the computational burden.<n>Second, we propose a new parallel MoE module that replaces traditional feed-forward networks with Gated Linear Units (GLUs)
arXiv Detail & Related papers (2026-01-08T18:00:58Z) - A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z) - SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting [64.45587649141842]
Time-series forecasting plays a critical role in many real-world applications.<n>No single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases.<n>We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models.
arXiv Detail & Related papers (2025-05-24T00:45:07Z) - Bridging Simplicity and Sophistication using GLinear: A Novel Architecture for Enhanced Time Series Prediction [1.52551943336894]
Time Series Forecasting (TSF) is an important application across many fields.<n>Recent research suggests simpler linear models might outperform or at least provide competitive performance compared to complex Transformer-based models for TSF tasks.
arXiv Detail & Related papers (2025-01-02T06:19:53Z) - Disentangled Interpretable Representation for Efficient Long-term Time Series Forecasting [8.315265596107686]
Industry 5.0 introduces new challenges for Long-term Time Series Forecasting (LTSF)
Existing deep learning and linear models often suffer from excessive complexity and lack intuitive interpretability.
We propose DiPE-Linear, a Disentangled interpretable.
Linear network.
arXiv Detail & Related papers (2024-11-26T09:33:09Z) - LeMoLE: LLM-Enhanced Mixture of Linear Experts for Time Series Forecasting [9.132953776171808]
This paper introduces an LLM-enhanced mixture of linear experts for precise and efficient time series forecasting.<n>The use of a mixture of linear experts is efficient due to its simplicity, while the multimodal fusion mechanism adaptively combines multiple linear experts.<n>Our experimental results show that the proposed LeMoLE model presents lower prediction errors and higher computational efficiency than existing LLM models.
arXiv Detail & Related papers (2024-11-24T12:40:50Z) - Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts [103.725112190618]
This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts.
Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
arXiv Detail & Related papers (2024-10-14T13:01:11Z) - SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters [16.966008476215258]
This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF)
At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data.
SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data.
arXiv Detail & Related papers (2024-05-02T02:15:23Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis.
Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation.
Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z) - Mixture-of-Linear-Experts for Long-term Time Series Forecasting [13.818468255379969]
We propose a Mixture-of-Experts-style augmentation for linear-centric models.
Instead of training a single model, MoLE trains multiple linear-centric models and a router model that weighs and mixes their outputs.
arXiv Detail & Related papers (2023-12-11T19:05:02Z) - Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior.
Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map.
KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.