Related papers: NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs

NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs

URL: http://arxiv.org/abs/2506.11050v1
Date: Thu, 22 May 2025 02:13:30 GMT
Title: NSW-EPNews: A News-Augmented Benchmark for Electricity Price Forecasting with LLMs
Authors: Zhaoge Bi, Linghan Huang, Haolin Jin, Qingwen Zeng, Huaming Chen,
Abstract summary: We introduce NSW-EPNews, the first benchmark that jointly evaluates time-series models and large language models (LLMs) on real-world electricity-price prediction.<n>The dataset includes over 175,000 half-hourly spot prices from New South Wales, Australia (2015-2024), daily temperature readings, and curated market-news summaries from WattClarity.
Score: 0.5172964916120903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Electricity price forecasting is a critical component of modern energy-management systems, yet existing approaches heavily rely on numerical histories and ignore contemporaneous textual signals. We introduce NSW-EPNews, the first benchmark that jointly evaluates time-series models and large language models (LLMs) on real-world electricity-price prediction. The dataset includes over 175,000 half-hourly spot prices from New South Wales, Australia (2015-2024), daily temperature readings, and curated market-news summaries from WattClarity. We frame the task as 48-step-ahead forecasting, using multimodal input, including lagged prices, vectorized news and weather features for classical models, and prompt-engineered structured contexts for LLMs. Our datasets yields 3.6k multimodal prompt-output pairs for LLM evaluation using specific templates. Through compresive benchmark design, we identify that for traditional statistical and machine learning models, the benefits gain is marginal from news feature. For state-of-the-art LLMs, such as GPT-4o and Gemini 1.5 Pro, we observe modest performance increase while it also produce frequent hallucinations such as fabricated and malformed price sequences. NSW-EPNews provides a rigorous testbed for evaluating grounded numerical reasoning in multimodal settings, and highlights a critical gap between current LLM capabilities and the demands of high-stakes energy forecasting.

Related papers

A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets [1.0730888578919362]
This paper proposes a few-shot classification framework based on Large Language Models (LLMs) to predict whether the next day will have spikes in real-time electricity prices.<n>Using historical data from the Texas electricity market, we demonstrate that this few-shot approach achieves performance comparable to supervised machine learning models.
arXiv Detail & Related papers (2026-02-17T20:54:44Z)
LightGTS-Cov: Covariate-Enhanced Time Series Forecasting [5.893050294112672]
Time series foundation models are typically pre-trained on large, multi-source datasets.<n>We introduce LightGTS-Cov, a covariate-enhanced extension of LightGTS that preserves its lightweight, period-aware backbone.<n>We demonstrate its practical value in two real-world energy case applications.
arXiv Detail & Related papers (2026-02-11T01:51:25Z)
Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking [51.56484100374058]
We evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods.<n>Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models.
arXiv Detail & Related papers (2026-02-03T16:01:22Z)
PriceSeer: Evaluating Large Language Models in Real-Time Stock Prediction [47.70107097572211]
We introduce PriceSeer, a benchmark specifically designed for large language models performing stock prediction tasks.<n>PriceSeer includes 110 U.S. stocks from 11 industrial sectors, with each containing 249 historical data points.<n>We evaluate six cutting-edge LLMs under different prediction horizons, demonstrating their potential in generating investment strategies.
arXiv Detail & Related papers (2025-12-31T08:35:46Z)
The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification [74.64864354503204]
We propose The Forecast Critic, a system that leverages Large Language Models (LLMs) for automated forecast monitoring.<n>We evaluate the ability of LLMs to assess time series forecast quality.<n>We present three experiments, including on both synthetic and real-world forecasting data.
arXiv Detail & Related papers (2025-12-12T21:59:53Z)
Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting [1.1557852082644071]
We benchmark several state-of-the-art pretrained models against established statistical and machine learning (ML) methods for electricity price forecasting.<n>Using 2024 day-ahead auction (DAA) electricity prices from Germany, France, the Netherlands, Austria, and Belgium, we generate daily forecasts with a one-day horizon.<n>Chronos-Bolt and Time-MoE emerge as the strongest among the TSFMs, performing on par with traditional models.
arXiv Detail & Related papers (2025-06-09T18:10:00Z)
Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection.<n>Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv Detail & Related papers (2025-04-02T20:33:27Z)
LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data [63.777637042161544]
This paper introduces a novel forecast post-processor that fine-tunes large language models to incorporate unstructured semantic and contextual information and historical data.<n>In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.
arXiv Detail & Related papers (2024-12-03T16:18:42Z)
BreakGPT: Leveraging Large Language Models for Predicting Asset Price Surges [55.2480439325792]
This paper introduces BreakGPT, a novel large language model (LLM) architecture adapted specifically for time series forecasting and the prediction of sharp upward movements in asset prices. We showcase BreakGPT as a promising solution for financial forecasting with minimal training and as a strong competitor for capturing both local and global temporal dependencies.
arXiv Detail & Related papers (2024-11-09T05:40:32Z)
CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning [59.88924847995279]
We propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF.<n>To reduce the distribution discrepancy, we develop the cross-modal match module.<n>CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks.
arXiv Detail & Related papers (2024-03-12T04:04:38Z)
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts [54.07541591018305]
We present MAD-Bench, a benchmark that contains 1000 test samples divided into 5 categories, such as non-existent objects, count of objects, and spatial relationship. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4v, Reka, Gemini-Pro, to open-sourced models, such as LLaVA-NeXT and MiniCPM-Llama3. While GPT-4o achieves 82.82% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 9% to 50%.
arXiv Detail & Related papers (2024-02-20T18:31:27Z)
A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market [0.36832029288386137]
The South Australia region of the Australian National Electricity Market displays some of the highest levels of price volatility observed in modern electricity markets. This paper outlines an approach to probabilistic forecasting under these extreme conditions, including spike filtration and several post-processing steps.
arXiv Detail & Related papers (2023-11-13T12:33:33Z)
AI Driven Near Real-time Locational Marginal Pricing Method: A Feasibility and Robustness Study [0.6144680854063939]
Locational Marginal Pricing (LMP) pricing mechanism is used in many modern power markets. For large electricity grids this process becomes prohibitively time-consuming and computationally intensive. This study evaluates the performance of popular machine learning and deep learning models in predicting LMP on multiple electricity grids.
arXiv Detail & Related papers (2023-06-16T06:41:04Z)
A comparative assessment of deep learning models for day-ahead load forecasting: Investigating key accuracy drivers [2.572906392867547]
Short-term load forecasting (STLF) is vital for the effective and economic operation of power grids and energy markets. Several deep learning models have been proposed in the literature for STLF, reporting promising results.
arXiv Detail & Related papers (2023-02-23T17:11:04Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)
A Hybrid Model for Forecasting Short-Term Electricity Demand [59.372588316558826]
Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and LSTM encoder-decoders.
arXiv Detail & Related papers (2022-05-20T22:13:25Z)
Deep Learning Approaches for Forecasting Strawberry Yields and Prices Using Satellite Images and Station-Based Soil Parameters [2.3513645401551333]
We propose here an alternate approach based on deep learning algorithms for forecasting strawberry yields and prices in Santa Barbara county, California. Building the proposed forecasting model comprises three stages: first, the station-based ensemble model (ATT-CNN-LSTM-SeriesNet_Ens) with its compound deep learning components. Second, the remote sensing ensemble model (SIM_CNN-LSTM_Ens) is trained and tested using satellite images of the same county as input mapped to the same yields and prices as output. Third, the forecasts of these two models are ensembled to have a final forecasted value
arXiv Detail & Related papers (2021-02-17T20:54:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.