TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts
- URL: http://arxiv.org/abs/2509.23145v1
- Date: Sat, 27 Sep 2025 06:22:09 GMT
- Title: TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts
- Authors: Xiaowen Ma, Shuning Ge, Fan Yang, Xiangyu Li, Yun Chen, Mengting Ma, Wei Zhang, Zhipeng Liu,
- Abstract summary: We propose the Temporal Mix of Experts (TMOE), a novel attention-level mechanism that reimagines key-value (K-V) pairs as local experts.<n>TMOE performs adaptive expert selection for each query via localized filtering of irrelevant timestamps.<n>We then replace the vanilla attention mechanism in popular time-series Transformer frameworks (i.e., PatchTST and Timer) with TMOE, without extra structural modifications.
- Score: 11.53964887034519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based architectures dominate time series modeling by enabling global attention over all timestamps, yet their rigid 'one-size-fits-all' context aggregation fails to address two critical challenges in real-world data: (1) inherent lag effects, where the relevance of historical timestamps to a query varies dynamically; (2) anomalous segments, which introduce noisy signals that degrade forecasting accuracy. To resolve these problems, we propose the Temporal Mix of Experts (TMOE), a novel attention-level mechanism that reimagines key-value (K-V) pairs as local experts (each specialized in a distinct temporal context) and performs adaptive expert selection for each query via localized filtering of irrelevant timestamps. Complementing this local adaptation, a shared global expert preserves the Transformer's strength in capturing long-range dependencies. We then replace the vanilla attention mechanism in popular time-series Transformer frameworks (i.e., PatchTST and Timer) with TMOE, without extra structural modifications, yielding our specific version TimeExpert and general version TimeExpert-G. Extensive experiments on seven real-world long-term forecasting benchmarks demonstrate that TimeExpert and TimeExpert-G outperform state-of-the-art methods. Code is available at https://github.com/xwmaxwma/TimeExpert.
Related papers
- MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z) - Kairos: Towards Adaptive and Generalizable Time Series Foundation Models [27.076542021368056]
Time series foundation models (TSFMs) have emerged as a powerful paradigm for time series analysis.<n>We propose Kairos, a flexible TSFM framework that integrates a dynamic patching tokenizer and an instance-adaptive positional embedding.<n>Kairos achieves superior performance with much fewer parameters on two common zero-shot benchmarks.
arXiv Detail & Related papers (2025-09-30T06:02:26Z) - TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state [12.940694192516059]
In long-term time series forecasting, different variables often influence the target variable over distinct time intervals.<n>Traditional models typically process all variables or time points uniformly, which limits their ability to capture complex variable relationships.<n>We propose TimePro, an innovative Mamba-based model that constructs variate- and time-aware hyper-states.
arXiv Detail & Related papers (2025-05-27T06:24:21Z) - Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting [26.59526791215]
We identify two key challenges in cross-domain time series forecasting: the complexity of temporal patterns and semantic misalignment.<n>We propose the Unify and Anchor" transfer paradigm, which disentangles frequency components for a unified perspective.<n>We introduce ContexTST, a Transformer-based model that employs a time series coordinator for structured representation.
arXiv Detail & Related papers (2025-03-03T04:11:14Z) - TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting [87.71846357354384]
Time series forecasting methods generally fall into two main categories: Channel Independent (CI) and Channel Dependent (CD)<n>Recent advances in Channel Clustering (CC) aim to refine dependency modeling by grouping channels with similar characteristics.<n>We propose TimeFilter, a GNN-based framework for adaptive and fine-grained dependency modeling.
arXiv Detail & Related papers (2025-01-22T17:40:17Z) - Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift [30.581736814767606]
Time series forecasting aims to predict future values based on historical data.
Real-world time often exhibit complex non-uniform distribution with varying patterns across segments, such as season, operating condition, or semantic meaning.
We propose bftextS, a novel architecture that leverages pattern-specific experts for more accurate and adaptable time series forecasting.
arXiv Detail & Related papers (2024-10-13T13:35:29Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables [75.83318701911274]
TimeXer ingests external information to enhance the forecasting of endogenous variables.
TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks.
arXiv Detail & Related papers (2024-02-29T11:54:35Z) - Sparse Transformer with Local and Seasonal Adaptation for Multivariate Time Series Forecasting [8.000134983886742]
We propose a Dozer Attention mechanism consisting of three sparse components.
These components are designed to capture essential attributes of MTS data, including locality, seasonality, and global temporal dependencies.
We present the Dozerformer Framework, incorporating the Dozer Attention mechanism for the MTS forecasting task.
arXiv Detail & Related papers (2023-12-11T22:49:02Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.