Related papers: Probabilistic Transformers for Joint Modeling of Global Weather Dynamics and Decision-Centric Variables

Probabilistic Transformers for Joint Modeling of Global Weather Dynamics and Decision-Centric Variables

URL: http://arxiv.org/abs/2601.03753v1
Date: Wed, 07 Jan 2026 09:43:36 GMT
Title: Probabilistic Transformers for Joint Modeling of Global Weather Dynamics and Decision-Centric Variables
Authors: Paulius Rauba, Viktor Cikojevic, Fran Bartolic, Sam Levang, Ty Dickinson, Chase Dwelle,
Abstract summary: Weather forecasts sit upstream of high-stakes decisions in domains such as grid operations, aviation, agriculture, and emergency response.<n>Many decision-relevant targets are functionals of the atmospheric state variables, such as extrema, accumulations, and threshold exceedances, rather than state variables themselves.<n>We introduce GEM-2, a probabilistic transformer that jointly learns global atmospheric dynamics alongside a suite of variables that users directly act upon.
Score: 1.7632505349721652
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weather forecasts sit upstream of high-stakes decisions in domains such as grid operations, aviation, agriculture, and emergency response. Yet forecast users often face a difficult trade-off. Many decision-relevant targets are functionals of the atmospheric state variables, such as extrema, accumulations, and threshold exceedances, rather than state variables themselves. As a result, users must estimate these targets via post-processing, which can be suboptimal and can introduce structural bias. The core issue is that decisions depend on distributions over these functionals that the model is not trained to learn directly. In this work, we introduce GEM-2, a probabilistic transformer that jointly learns global atmospheric dynamics alongside a suite of variables that users directly act upon. Using this training recipe, we show that a lightweight (~275M params) and computationally efficient (~20-100x training speedup relative to state-of-the-art) transformer trained on the CRPS objective can directly outperform operational numerical weather prediction (NWP) models and be competitive with ML models that rely on expensive multi-step diffusion processes or require bespoke multi-stage fine-tuning strategies. We further demonstrate state-of-the-art economic value metrics under decision-theoretic evaluation, stable convergence to climatology at S2S and seasonal timescales, and a surprising insensitivity to many commonly assumed architectural and training design choices.

Related papers

Toward generative machine learning for boosting ensembles of climate simulations [0.0]
We develop a conditional Variational Autoencoder (cVAE) trained on a limited sample of climate simulations to generate arbitrary large ensembles.<n>We show that the cVAE model learns the underlying distribution of the data and generates physically consistent samples that reproduce realistic low and high moment statistics.
arXiv Detail & Related papers (2026-02-06T00:54:19Z)
Demystifying Data-Driven Probabilistic Medium-Range Weather Forecasting [63.8116386935854]
We demonstrate that state-of-the-art probabilistic skill requires neither intricate architectural constraints nor specialized trainings.<n>We introduce a scalable framework for learning multi-scale atmospheric dynamics by combining a directly downsampled latent space with a history-conditioned local projector.<n>We find that our framework design is robust to the choice of probabilistic estimators, seamlessly supporting interpolants, diffusion models, and CRPS-based ensemble training.
arXiv Detail & Related papers (2026-01-26T03:52:16Z)
China Regional 3km Downscaling Based on Residual Corrective Diffusion Model [39.12803910865843]
This work focuses on statistical downscaling, which establishes statistical relationships between low-resolution and high-resolution historical data.<n>In contrast to the original work of CorrDiff, the region considered in this work is nearly 40 times larger.<n>Deep learning has emerged as a powerful tool for this task, giving rise to various high-performance super-resolution models.
arXiv Detail & Related papers (2025-12-05T02:27:08Z)
Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving [54.46325690390831]
We propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment.<n>MPA first generates diverse counterfactual trajectories using a geometry-consistent simulation engine.<n>MPA trains a diffusion-based policy adapter to refine the base policy's predictions and a multi-step Q value model to evaluate long-term outcomes.
arXiv Detail & Related papers (2025-11-26T17:01:41Z)
ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models [13.740208247043258]
We propose ClimateLLM, a foundation model for weather forecasting.<n>It captures temporal dependencies via a cross-temporal and cross-spatial collaborative framework.<n>It integrates frequency decomposition with Large Language Models to strengthen spatial and temporal modeling.
arXiv Detail & Related papers (2025-02-16T09:57:50Z)
Prithvi WxC: Foundation Model for Weather and Climate [2.9230020115516253]
Prithvi WxC is a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions. We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation.
arXiv Detail & Related papers (2024-09-20T15:53:17Z)
Efficient Localized Adaptation of Neural Weather Forecasting: A Case Study in the MENA Region [62.09891513612252]
We focus on limited-area modeling and train our model specifically for localized region-level downstream tasks. We consider the MENA region due to its unique climatic challenges, where accurate localized weather forecasting is crucial for managing water resources, agriculture and mitigating the impacts of extreme weather events. Our study aims to validate the effectiveness of integrating parameter-efficient fine-tuning (PEFT) methodologies, specifically Low-Rank Adaptation (LoRA) and its variants, to enhance forecast accuracy, as well as training speed, computational resource utilization, and memory efficiency in weather and climate modeling for specific regions.
arXiv Detail & Related papers (2024-09-11T19:31:56Z)
Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction [1.3194391758295114]
We show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS.
arXiv Detail & Related papers (2024-04-30T15:30:14Z)
ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast. We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples. Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z)
Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative. The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z)
ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science. It can be pre-trained with a self-supervised learning objective on climate datasets. It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.