Related papers: Probabilistic measures afford fair comparisons of AIWP and NWP model output

Probabilistic measures afford fair comparisons of AIWP and NWP model output

URL: http://arxiv.org/abs/2506.03744v1
Date: Wed, 04 Jun 2025 09:14:45 GMT
Title: Probabilistic measures afford fair comparisons of AIWP and NWP model output
Authors: Tilmann Gneiting, Tobias Biegert, Kristof Kraus, Eva-Maria Walz, Alexander I. Jordan, Sebastian Lerch,
Abstract summary: We introduce a new measure for fair and meaningful comparisons of single-valued output from AIWP and NWP models.<n>We find PC as the mean continuous ranked probability score (CRPS) of the postprocessed probabilistic forecasts.<n>Our approach affords comparisons of single-valued forecasts in settings where the pre-specification of a loss function places competitors on unequal footings.
Score: 37.69303106863453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new measure for fair and meaningful comparisons of single-valued output from artificial intelligence based weather prediction (AIWP) and numerical weather prediction (NWP) models, called potential continuous ranked probability score (PC). In a nutshell, we subject the deterministic backbone of physics-based and data-driven models post hoc to the same statistical postprocessing technique, namely, isotonic distributional regression (IDR). Then we find PC as the mean continuous ranked probability score (CRPS) of the postprocessed probabilistic forecasts. The nonnegative PC measure quantifies potential predictive performance and is invariant under strictly increasing transformations of the model output. PC attains its most desirable value of zero if, and only if, the weather outcome Y is a fixed, non-decreasing function of the model output X. The PC measure is recorded in the unit of the outcome, has an upper bound of one half times the mean absolute difference between outcomes, and serves as a proxy for the mean CRPS of real-time, operational probabilistic products. When applied to WeatherBench 2 data, our approach demonstrates that the data-driven GraphCast model outperforms the leading, physics-based European Centre for Medium Range Weather Forecasts (ECMWF) high-resolution (HRES) model. Furthermore, the PC measure for the HRES model aligns exceptionally well with the mean CRPS of the operational ECMWF ensemble. Across application domains, our approach affords comparisons of single-valued forecasts in settings where the pre-specification of a loss function -- which is the usual, and principally superior, procedure in forecast contests, administrative, and benchmarks settings -- places competitors on unequal footings.

Related papers

Post-processing improves accuracy of Artificial Intelligence weather forecasts [0.14043931310479374]
We test the application of Bureau of Meteorology's statistical post-processing system, IMPROVER, to ECMWF's deterministic AIFS.<n>We show that blending AIFS with NWP models improves overall forecast skill, even when AIFS alone is not the most accurate component.
arXiv Detail & Related papers (2025-04-17T06:05:10Z)
On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations. We propose an autoregressive sampling approach that significantly improves performance in forecasting. We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z)
Improving probabilistic forecasts of extreme wind speeds by training statistical post-processing models with weighted scoring rules [0.0]
Training using the threshold-weighted continuous ranked probability score (twCRPS) leads to improved extreme event performance of post-processing models.<n>We find a distribution body-tail trade-off where improved performance for probabilistic predictions of extreme events comes with worse performance for predictions of the distribution body.
arXiv Detail & Related papers (2024-07-22T11:07:52Z)
ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast. We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples. Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z)
A Practical Probabilistic Benchmark for AI Weather Models [0.7978324349017066]
We show that two leading AI weather models, i.e. GraphCast and Pangu, are tied on the probabilistic CRPS metric. We also reveal how multiple time-step loss functions, which many data-driven weather models have employed, are counter-productive.
arXiv Detail & Related papers (2024-01-27T05:53:16Z)
When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2023-10-17T20:30:16Z)
Generative ensemble deep learning severe weather prediction from a deterministic convection-allowing model [0.0]
Method combines conditional generative adversarial networks (CGANs) with a convolutional neural network (CNN) to post-process convection-allowing model (CAM) forecasts. The CGANs are designed to create synthetic ensemble members from deterministic CAM forecasts. The method produced skillful predictions with up to 20% Brier Skill Score (BSS) increases compared to other neural-network-based reference methods.
arXiv Detail & Related papers (2023-10-09T18:02:11Z)
SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models [13.331224394143117]
Uncertainty quantification is crucial to decision-making. dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. We propose to amortize the computational cost by emulating these forecasts with deep generative diffusion models learned from historical data.
arXiv Detail & Related papers (2023-06-24T22:00:06Z)
When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z)
Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z)
A framework for probabilistic weather forecast post-processing across models and lead times using machine learning [3.1542695050861544]
We show how to bridge the gap between sets of separate forecasts from NWP models and the 'ideal' forecast for decision support. We use Quantile Regression Forests to learn the error profile of each numerical model, and use these to apply empirically-derived probability distributions to forecasts. Second, we combine these probabilistic forecasts using quantile averaging. Third, we interpolate between the aggregate quantiles in order to generate a full predictive distribution.
arXiv Detail & Related papers (2020-05-06T16:46:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.