FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts
- URL: http://arxiv.org/abs/2601.05174v1
- Date: Thu, 08 Jan 2026 18:00:58 GMT
- Title: FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts
- Authors: Yiji Zhao, Zihao Zhong, Ao Wang, Haomin Wen, Ming Jin, Yuxuan Liang, Huaiyu Wan, Hao Wu,
- Abstract summary: Existing models predominantly focus on short-horizon predictions and suffer from notorious computational costs and memory consumption.<n>We present FaST, an effective and efficient framework based on Mixture-of-Experts (MoEs) for long-horizon and large-scale STG forecasting.<n>FaST is underpinned by two key innovations. First, an adaptive graph agent attention mechanism is proposed to alleviate the computational burden.<n>Second, we propose a new parallel MoE module that replaces traditional feed-forward networks with Gated Linear Units (GLUs)
- Score: 49.9321870703948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial-Temporal Graph (STG) forecasting on large-scale networks has garnered significant attention. However, existing models predominantly focus on short-horizon predictions and suffer from notorious computational costs and memory consumption when scaling to long-horizon predictions and large graphs. Targeting the above challenges, we present FaST, an effective and efficient framework based on heterogeneity-aware Mixture-of-Experts (MoEs) for long-horizon and large-scale STG forecasting, which unlocks one-week-ahead (672 steps at a 15-minute granularity) prediction with thousands of nodes. FaST is underpinned by two key innovations. First, an adaptive graph agent attention mechanism is proposed to alleviate the computational burden inherent in conventional graph convolution and self-attention modules when applied to large-scale graphs. Second, we propose a new parallel MoE module that replaces traditional feed-forward networks with Gated Linear Units (GLUs), enabling an efficient and scalable parallel structure. Extensive experiments on real-world datasets demonstrate that FaST not only delivers superior long-horizon predictive accuracy but also achieves remarkable computational efficiency compared to state-of-the-art baselines. Our source code is available at: https://github.com/yijizhao/FaST.
Related papers
- Scalable Graph Generative Modeling via Substructure Sequences [50.32639806800683]
We introduce Generative Graph Pattern Machine (G$2$PM), a generative Transformer pre-training framework for graphs.<n>G$2$PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures.<n>It employs generative pre-training over the sequences to learn generalizable and transferable representations.
arXiv Detail & Related papers (2025-05-22T02:16:34Z) - Does Scaling Law Apply in Time Series Forecasting? [2.127584662240465]
We propose Alinear, an ultra-lightweight forecasting model that achieves competitive performance using only k-level parameters.<n>Experiments on seven benchmark datasets demonstrate that Alinear consistently outperforms large-scale models.<n>This work challenges the prevailing belief that larger models are inherently better and suggests a paradigm shift toward more efficient time series modeling.
arXiv Detail & Related papers (2025-05-15T11:04:39Z) - ScaleGNN: Towards Scalable Graph Neural Networks via Adaptive High-order Neighboring Feature Fusion [73.85920403511706]
We propose ScaleGNN, a novel framework that adaptively fuses multi-hop node features for scalable and effective graph learning.<n>We show that ScaleGNN consistently outperforms state-of-the-art GNNs in both predictive accuracy and computational efficiency.
arXiv Detail & Related papers (2025-04-22T14:05:11Z) - Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks.<n> scaling them to large graphs is challenging due to the high computational and storage costs.<n>We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z) - FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure
Graph Perspective [48.00240550685946]
Current state-of-the-art graph neural network (GNN)-based forecasting methods usually require both graph networks (e.g., GCN) and temporal networks (e.g., LSTM) to capture inter-series (spatial) dynamics and intra-series (temporal) dependencies, respectively.
We propose a novel Fourier Graph Neural Network (FourierGNN) by stacking our proposed Fourier Graph Operator (FGO) to perform matrix multiplications in Fourier space.
Our experiments on seven datasets have demonstrated superior performance with higher efficiency and fewer parameters compared with state-of-the-
arXiv Detail & Related papers (2023-11-10T17:13:26Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic
Forecasting [13.49661832917228]
We make the first attempt to explore long-term traffic forecasting, e.g., 1-day forecasting.
We propose a novel Hierarchical U-net TransFormer to address the issues of long-term traffic forecasting.
The proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.
arXiv Detail & Related papers (2023-07-27T02:43:21Z) - GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series
Forecasting [36.85187795776383]
A rapidly growing topic of interest is forecasting time series which lack sufficient historical data.
We introduce a novel yet simple method to address this problem by leveraging graph neural networks (GNNs) as a data augmentation.
We show that our architecture can use either data-driven or domain knowledge-defined graphs, scaling to incorporate information from multiple very large graphs with millions of nodes.
arXiv Detail & Related papers (2023-07-07T13:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.