Related papers: Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics

Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics

URL: http://arxiv.org/abs/2509.22279v2
Date: Mon, 20 Oct 2025 06:08:41 GMT
Title: Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics
Authors: Xingjian Wu, Zhengyu Li, Hanyin Cheng, Xiangfei Qiu, Jilin Hu, Chenjuan Guo, Bin Yang,
Abstract summary: Time Series Analysis is widely used in various real-world applications such as weather forecasting, financial fraud detection, imputation for missing data in IoT systems, and classification for action recognization.<n>MoE, as a powerful architecture, still falls short in adapting to versatile tasks in time series analytics due to its task-agnostic router and the lack of capability in modeling channel correlations.<n>We propose a novel, general MoE-based time series framework called PatchMoE to support the intricate knowledge'' utilization for distinct tasks, thus task-aware.
Score: 18.97715342585514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time Series Analysis is widely used in various real-world applications such as weather forecasting, financial fraud detection, imputation for missing data in IoT systems, and classification for action recognization. Mixture-of-Experts (MoE), as a powerful architecture, though demonstrating effectiveness in NLP, still falls short in adapting to versatile tasks in time series analytics due to its task-agnostic router and the lack of capability in modeling channel correlations. In this study, we propose a novel, general MoE-based time series framework called PatchMoE to support the intricate ``knowledge'' utilization for distinct tasks, thus task-aware. Based on the observation that hierarchical representations often vary across tasks, e.g., forecasting vs. classification, we propose a Recurrent Noisy Gating to utilize the hierarchical information in routing, thus obtaining task-sepcific capability. And the routing strategy is operated on time series tokens in both temporal and channel dimensions, and encouraged by a meticulously designed Temporal \& Channel Load Balancing Loss to model the intricate temporal and channel correlations. Comprehensive experiments on five downstream tasks demonstrate the state-of-the-art performance of PatchMoE.

Related papers

MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z)
TSAQA: Time Series Analysis Question And Answering Benchmark [85.35545785252309]
Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science.<n>We introduce TSAQA, a novel unified benchmark designed to broaden task coverage and evaluate diverse temporal analysis capabilities.
arXiv Detail & Related papers (2026-01-30T17:28:56Z)
Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting [6.1337977581640075]
Existing methods typically employ a strategy of embedding each time step as an independent token.<n>Time-TK significantly outperforms all baseline models, achieving state-of-the-art forecasting accuracy.
arXiv Detail & Related papers (2026-01-30T10:11:51Z)
FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs [6.199165061105655]
We introduce the Temporal Graph Reasoning Benchmark (T-GRAB) to systematically probe the capabilities of TGNNs to reason across time.<n>T-GRAB provides controlled, interpretable tasks that isolate key temporal skills.<n>We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns.
arXiv Detail & Related papers (2025-07-14T11:47:43Z)
Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition [43.84348967231349]
Few-shot action recognition aims to recognize novel action categories with few exemplars.<n>Existing methods typically learn frame-level representations for each video by designing inter-frame temporal modeling strategies.<n>We propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR.
arXiv Detail & Related papers (2025-04-14T10:23:22Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis [12.867257563413972]
Time series anomaly detection is a challenging problem due to the complex temporal dependencies and the limited label data. We propose a Time-Frequency analysis based time series Anomaly Detection model, or TFAD, to exploit both time and frequency domains for performance improvement.
arXiv Detail & Related papers (2022-10-18T09:08:57Z)
Task-aware Similarity Learning for Event-triggered Time Series [25.101509208153804]
The overarching goal of this paper is to develop an unsupervised learning framework that is capable of learning task-aware similarities among unlabeled event-triggered time series. The proposed framework aspires to offer a stepping stone that gives rise to a systematic approach to model and learn similarities among a multitude of event-triggered time series.
arXiv Detail & Related papers (2022-07-17T12:54:10Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA) Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks. Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z)
Multi-task Over-the-Air Federated Learning: A Non-Orthogonal Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES) Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.