Related papers: EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting

EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting

URL: http://arxiv.org/abs/2510.23396v1
Date: Mon, 27 Oct 2025 14:55:30 GMT
Title: EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting
Authors: Musleh Alharthi, Kaleel Mahmood, Sarosh Patel, Ausif Mahmood,
Abstract summary: We propose a strong Mixture of Experts (MoE) framework for Time Series Forecasting.<n>Our method combines the state-of-the-art (SOTA) models including xLSTM, en hanced Linear, PatchTST, and minGRU.<n>Our proposed model outperforms all existing TSF models on standard benchmarks, surpassing even the latest approaches based on MoE frameworks.
Score: 0.750638869146118
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The immense success of the Transformer architecture in Natural Language Processing has led to its adoption in Time Se ries Forecasting (TSF), where superior performance has been shown. However, a recent important paper questioned their effectiveness by demonstrating that a simple single layer linear model outperforms Transformer-based models. This was soon shown to be not as valid, by a better transformer-based model termed PatchTST. More re cently, TimeLLM demonstrated even better results by repurposing a Large Language Model (LLM) for the TSF domain. Again, a follow up paper challenged this by demonstrating that removing the LLM component or replacing it with a basic attention layer in fact yields better performance. One of the challenges in forecasting is the fact that TSF data favors the more recent past, and is sometimes subject to unpredictable events. Based upon these recent insights in TSF, we propose a strong Mixture of Experts (MoE) framework. Our method combines the state-of-the-art (SOTA) models including xLSTM, en hanced Linear, PatchTST, and minGRU, among others. This set of complimentary and diverse models for TSF are integrated in a Trans former based MoE gating network. Our proposed model outperforms all existing TSF models on standard benchmarks, surpassing even the latest approaches based on MoE frameworks.

Related papers

Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols [123.73663884421272]
Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.<n>We establish FEWTRANS, a comprehensive benchmark containing 10 diverse datasets.<n>By releasing FEWTRANS, we aim to provide a rigorous "ruler" to streamline reproducible advances in few-shot transfer learning research.
arXiv Detail & Related papers (2026-02-28T05:41:57Z)
TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics [56.073642366268764]
TokaMind is an open-source foundation model framework for fusion plasma modeling.<n>It is trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset.<n>We evaluate TokaMind on the recently introduced MAST benchmark TokaMark.
arXiv Detail & Related papers (2026-02-16T12:26:07Z)
TSGym: Design Choices for Deep Multivariate Time-Series Forecasting [38.12202305030755]
This work bridges gaps by decomposing deep MTSF methods into their core, fine-grained components.<n>We propose a novel automated solution called TSGym for MTSF tasks.<n>Extensive experiments indicate that TSGym significantly outperforms existing state-of-the-art MTSF and AutoML methods.
arXiv Detail & Related papers (2025-09-21T12:49:31Z)
Fusing Large Language Models with Temporal Transformers for Time Series Forecasting [17.549938378193282]
Large language models (LLMs) have demonstrated powerful capabilities in performing various tasks.<n>LLMs are proficient at reasoning over discrete tokens and semantic patterns.<n> vanilla Transformers often struggle to learn high-level semantic patterns.
arXiv Detail & Related papers (2025-07-14T09:33:40Z)
Multi-Scale Finetuning for Encoder-based Time Series Foundation Models [67.95907033226585]
Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting.<n>While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities.<n>We propose Multiscale finetuning (MSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process.
arXiv Detail & Related papers (2025-06-17T01:06:01Z)
QuLTSF: Long-Term Time Series Forecasting with Quantum Machine Learning [4.2117721107606005]
Long-term time series forecasting (LTSF) involves predicting a large number of future values of a time series based on the past values.<n>Recent quantum machine learning (QML) is evolving as a domain to enhance the capabilities of classical machine learning models.<n>We show the advantages of QuLTSF over the state-of-the-art classical linear models, in terms of reduced mean squared error and mean absolute error.
arXiv Detail & Related papers (2024-12-18T12:06:52Z)
sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting [6.434378359932152]
We review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data. We propose $textbfsTransformer$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information. We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T06:23:41Z)
LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting [69.33802286580786]
We introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs.<n>It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, base model selection, data quantity, and dataset diversity.<n> Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods.
arXiv Detail & Related papers (2024-06-20T07:09:19Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting [0.0]
Recent studies have attempted to refine the Transformer architecture to demonstrate its effectiveness in Long-Term Time Series Forecasting tasks.<n>We attribute the effectiveness of these models largely to the adopted Patch mechanism.<n>We propose a novel and simple Patch-based components (PatchMLP) for LTSF tasks.
arXiv Detail & Related papers (2024-05-22T12:12:20Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models. We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively. Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.