Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction
- URL: http://arxiv.org/abs/2404.19630v1
- Date: Tue, 30 Apr 2024 15:30:14 GMT
- Title: Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction
- Authors: Jared D. Willard, Peter Harrington, Shashank Subramanian, Ankur Mahesh, Travis A. O'Brien, William D. Collins,
- Abstract summary: We show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets.
Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS.
- Score: 1.3194391758295114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to discern which components are most critical to success. In this work, we show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS. We present some ablations on key aspects of the training pipeline, exploring different loss functions, model sizes and depths, and multi-step fine-tuning to investigate their effect. We also examine the model performance with metrics beyond the typical ACC and RMSE, and investigate how the performance scales with model size.
Related papers
- Prithvi WxC: Foundation Model for Weather and Climate [2.9230020115516253]
Prithvi WxC is a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2).
The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions.
We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation.
arXiv Detail & Related papers (2024-09-20T15:53:17Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling [55.13352174687475]
This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales.
Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale.
We introduce a lead time-aware training framework to promote the generalization of the model at different lead times.
arXiv Detail & Related papers (2024-05-22T16:21:02Z) - EWMoE: An effective model for global weather forecasting with mixture-of-experts [6.695845790670147]
We propose EWMoE, an effective model for accurate global weather forecasting, which requires significantly less training data and computational resources.
Our model incorporates three key components to enhance prediction accuracy: 3D absolute position embedding, a core Mixture-of-Experts layer, and two specific loss functions.
arXiv Detail & Related papers (2024-05-09T16:42:13Z) - ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast.
We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples.
Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Inductive biases in deep learning models for weather prediction [17.061163980363492]
We review and analyse the inductive biases of state-of-the-art deep learning-based weather prediction models.
We identify the most important inductive biases and highlight potential avenues towards more efficient and probabilistic DLWP models.
arXiv Detail & Related papers (2023-04-06T14:15:46Z) - Transfer Learning in Deep Learning Models for Building Load Forecasting:
Case of Limited Data [0.0]
This paper proposes a Building-to-Building Transfer Learning framework to overcome the problem and enhance the performance of Deep Learning models.
The proposed approach improved the forecasting accuracy by 56.8% compared to the case of conventional deep learning where training from scratch is used.
arXiv Detail & Related papers (2023-01-25T16:05:47Z) - Deep learning for improved global precipitation in numerical weather
prediction systems [1.721029532201972]
We use the UNET architecture of a deep convolutional neural network with residual learning as a proof of concept to learn global data-driven models of precipitation.
The results are compared with the operational dynamical model used by the India Meteorological Department.
This study is a proof-of-concept showing that residual learning-based UNET can unravel physical relationships to target precipitation.
arXiv Detail & Related papers (2021-06-20T05:10:42Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.