Related papers: Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting

Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting

URL: http://arxiv.org/abs/2602.21415v1
Date: Tue, 24 Feb 2026 22:42:39 GMT
Title: Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting
Authors: Sunki Hong, Jisoo Lee, Yuanyuan Shi,
Abstract summary: This paper presents a benchmark of five modern neural architectures for power grid forecasting.<n>We evaluate these models on hourly electricity demand across six diverse US power grids for forecast windows between 24 and 168 hours.<n>Our results reveal that there is no single best model for all situations.
Score: 7.704162341156194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Selecting the right deep learning model for power grid forecasting is challenging, as performance heavily depends on the data available to the operator. This paper presents a comprehensive benchmark of five modern neural architectures: two state space models (PowerMamba, S-Mamba), two Transformers (iTransformer, PatchTST), and a traditional LSTM. We evaluate these models on hourly electricity demand across six diverse US power grids for forecast windows between 24 and 168 hours. To ensure a fair comparison, we adapt each model with specialized temporal processing and a modular layer that cleanly integrates weather covariates. Our results reveal that there is no single best model for all situations. When forecasting using only historical load, PatchTST and the state space models provide the highest accuracy. However, when explicit weather data is added to the inputs, the rankings reverse: iTransformer improves its accuracy three times more efficiently than PatchTST. By controlling for model size, we confirm that this advantage stems from the architecture's inherent ability to mix information across different variables. Extending our evaluation to solar generation, wind power, and wholesale prices further demonstrates that model rankings depend on the forecast task: PatchTST excels on highly rhythmic signals like solar, while state space models are better suited for the chaotic fluctuations of wind and price. Ultimately, this benchmark provides grid operators with actionable guidelines for selecting the optimal forecasting architecture based on their specific data environments.

Related papers

Scaling Laws of Global Weather Models [57.27583619011988]
We investigate the relationship between model performance (validation loss) and three key factors: model size, dataset size, and compute budget.<n>Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior.<n>Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size.
arXiv Detail & Related papers (2026-02-26T12:57:38Z)
LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z)
Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z)
Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z)
Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France [1.2289361708127877]
This study presents a comprehensive methodology for predicting solar and wind power production at country scale in France.<n>A dataset is built spanning from 2012 to 2023, using daily power production data from RTE.<n>Three modeling approaches are explored to handle spatially resolved weather data.
arXiv Detail & Related papers (2025-04-14T15:30:54Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.<n>Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.<n>Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
GC-GRU-N for Traffic Prediction using Loop Detector Data [5.735035463793008]
We use Seattle loop detector data aggregated over 15 minutes and reframe the problem through space time. The model ranked second with the fastest inference time and a very close performance to first place (Transformers)
arXiv Detail & Related papers (2022-11-13T06:32:28Z)
Physics Informed Shallow Machine Learning for Wind Speed Prediction [66.05661813632568]
We analyze a massive dataset of wind measured from anemometers located at 10 m height in 32 locations in Italy. We train supervised learning algorithms using the past history of wind to predict its value at a future time. We find that the optimal design as well as its performance vary with the location.
arXiv Detail & Related papers (2022-04-01T14:55:10Z)
Real-Time Forecasting of Dockless Scooter-Sharing Demand: A Spatio-Temporal Multi-Graph Transformer Approach [5.6973480878880824]
This paper proposes a novel deep learning architecture named S-Temporal Multi-Graph Transformer (S-TMGT) to forecast real-time dockless scooter-sharing demand. The proposed model can help the micromobility operators develop optimal vehicle rebalancing schemes and guide cities to better manage dockless scooter-sharing operations.
arXiv Detail & Related papers (2021-11-02T03:48:48Z)
Multistream Graph Attention Networks for Wind Speed Forecasting [4.644923443649426]
This paper presents a new model for wind speed prediction based on Graph Attention Networks (GAT) In particular, the proposed model extends GAT architecture by equipping it with a learnable adjacency matrix. We show that in comparison to previous architectures used for wind speed prediction, the proposed model is able to better learn the complex input-output relationships of the weather data.
arXiv Detail & Related papers (2021-08-16T12:58:26Z)
TENT: Tensorized Encoder Transformer for Temperature Forecasting [3.498371632913735]
We introduce a new model based on the Transformer architecture for weather forecasting. We show that compared to the original transformer and 3D convolutional neural networks, the proposed TENT model can better model the underlying complex pattern of weather data. Experiments on two real-life weather datasets are performed.
arXiv Detail & Related papers (2021-06-28T14:17:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.