Related papers: MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts

MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts

URL: http://arxiv.org/abs/2601.21866v1
Date: Thu, 29 Jan 2026 15:35:26 GMT
Title: MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts
Authors: Evandro S. Ortigossa, Guy Lutsker, Eran Segal,
Abstract summary: Real-world time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes.<n>MoHETS integrates sparse Mixture-of-Heterogeneous-Experts layers.<n>We replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder.
Score: 0.8292000624465587
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world multivariate time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes, which makes long-horizon forecasting challenging. Although sparse Mixture-of-Experts (MoE) approaches improve scalability and specialization, they typically rely on homogeneous MLP experts that poorly capture the diverse temporal dynamics of time series data. We address these limitations with MoHETS, an encoder-only Transformer that integrates sparse Mixture-of-Heterogeneous-Experts (MoHE) layers. MoHE routes temporal patches to a small subset of expert networks, combining a shared depthwise-convolution expert for sequence-level continuity with routed Fourier-based experts for patch-level periodic structures. MoHETS further improves robustness to non-stationary dynamics by incorporating exogenous information via cross-attention over covariate patch embeddings. Finally, we replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder, improving parameter efficiency, reducing training instability, and allowing a single model to generalize across arbitrary forecast horizons. We validate across seven multivariate benchmarks and multiple horizons, with MoHETS consistently achieving state-of-the-art performance, reducing the average MSE by $12\%$ compared to strong recent baselines, demonstrating effective heterogeneous specialization for long-term forecasting.

Related papers

HPMixer: Hierarchical Patching for Multivariate Time Series Forecasting [10.068780251829606]
We propose the Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner.<n>By integrating decoupled periodicity modeling with structured, multi-scale residual learning, HPMixer provides an effective framework.
arXiv Detail & Related papers (2026-02-18T13:59:04Z)
MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z)
MoDEx: Mixture of Depth-specific Experts for Multivariate Long-term Time Series Forecasting [13.403948071904628]
We introduce layer sensitivity, a gradient-based metric inspired by GradCAM and effective receptive field theory.<n>Applying this metric to a three-layer backbone reveals depth-specific expertise in modeling temporal dynamics.<n>MoDEx achieves accuracy on seven real-world benchmarks, ranking first in 78 percent of cases.
arXiv Detail & Related papers (2026-01-31T09:37:03Z)
Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers [0.9058414988965365]
We introduce Seg-MoE, a sparse MoE design that processes contiguous time-step segments rather than making independent expert decisions.<n>Seg-MoE consistently achieves state-of-the-art forecasting accuracy across almost all prediction horizons.<n>Our results show that aligning the MoE routing granularity with the inherent structure of time series provides a powerful, yet previously underexplored, inductive bias.
arXiv Detail & Related papers (2026-01-29T12:43:35Z)
FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z)
UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting [90.47915032778366]
We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T05:36:14Z)
A Multi-scale Representation Learning Framework for Long-Term Time Series Forecasting [6.344911113059126]
Long-term time series forecasting (LTSF) offers broad utility in practical settings like energy consumption and weather prediction.<n>This work confronts key issues in LTSF, including the suboptimal use of multi-granularity information.<n>Our method adeptly disentangles complex temporal dynamics using clear, concurrent predictions across various scales.
arXiv Detail & Related papers (2025-05-13T03:26:44Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network. It simultaneously considers correlations at three levels to enhance prediction performance. It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z)
TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting [19.88184356154215]
Time series forecasting is widely used in applications, such as traffic planning and weather forecasting. TimeMixer is able to achieve consistent state-of-the-art performances in both long-term and short-term forecasting tasks.
arXiv Detail & Related papers (2024-05-23T14:27:07Z)
Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective [63.60312929416228]
textbftextitAttraos incorporates chaos theory into long-term time series forecasting. We show that Attraos outperforms various LTSF methods on mainstream datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.
arXiv Detail & Related papers (2024-02-18T05:35:01Z)
A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis [14.40202378972828]
We propose MSD-Mixer, a Multi-Scale Decomposition-Mixer, which learns to explicitly decompose and represent the input time series in its different layers. We demonstrate that MSD-Mixer consistently and significantly outperforms other state-of-the-art algorithms with better efficiency.
arXiv Detail & Related papers (2023-10-18T13:39:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.