Related papers: Scaling Laws of Global Weather Models

Scaling Laws of Global Weather Models

URL: http://arxiv.org/abs/2602.22962v1
Date: Thu, 26 Feb 2026 12:57:38 GMT
Title: Scaling Laws of Global Weather Models
Authors: Yuejiang Yu, Langwen Huang, Alexandru Calotoiu, Torsten Hoefler,
Abstract summary: We investigate the relationship between model performance (validation loss) and three key factors: model size, dataset size, and compute budget.<n>Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior.<n>Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size.
Score: 57.27583619011988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.

Related papers

Compute-Optimal Scaling for Value-Based Deep RL [99.680827753493]
We investigate compute scaling for online, value-based deep RL.<n>Our analysis reveals a nuanced interplay between model size, batch size, and UTD.<n>We provide a mental model for understanding this phenomenon and build guidelines for choosing batch size and UTD.
arXiv Detail & Related papers (2025-08-20T17:54:21Z)
Scaling Laws of Motion Forecasting and Planning - Technical Report [21.486301157587132]
We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models.<n>We observe a strong correlation between model training loss and model evaluation metrics.<n>We briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent.
arXiv Detail & Related papers (2025-06-09T20:54:23Z)
Small-to-Large Generalization: Data Influences Models Consistently Across Scale [76.87199303408161]
We find that small- and large-scale language model predictions (generally) do highly correlate across choice of training data.<n>We also characterize how proxy scale affects effectiveness in two downstream proxy model applications: data attribution and dataset selection.
arXiv Detail & Related papers (2025-05-22T05:50:19Z)
Scaling Laws for Emulation of Stellar Spectra [0.0]
We provide training guidelines for scaling Transformer-based spectral emulators to achieve optimal performance.<n>Our results suggest that optimal computational resource allocation requires balanced scaling.<n>This study establishes a foundation for developing spectral foundational models with enhanced domain transfer capabilities.
arXiv Detail & Related papers (2025-03-24T12:20:24Z)
A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task. 'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
Scaling Laws for Neural Language Models [14.472857826717613]
We study scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training.
arXiv Detail & Related papers (2020-01-23T03:59:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.