VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
- URL: http://arxiv.org/abs/2412.02503v2
- Date: Fri, 18 Jul 2025 12:06:20 GMT
- Title: VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
- Authors: Hao Chen, Han Tao, Guo Song, Jie Zhang, Yunlong Yu, Yonghan Dong, Lei Bai,
- Abstract summary: VAMoE is a framework for weather forecasting that dynamically adapts to evolving in real time data.<n>The proposed method employs a variable adaptive gating mechanism to dynamically select and combine relevant experts.<n>Experiments on real world ERA5 dataset demonstrate that VAMoE performs comparable against Sotemporal models in both short term (1 days) and long term (5 days) forecasting tasks.
- Score: 18.37961811608821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Variables Adaptive Mixture of Experts (VAMoE), a novel framework for incremental weather forecasting that dynamically adapts to evolving spatiotemporal patterns in real time data. Traditional weather prediction models often struggle with exorbitant computational expenditure and the need to continuously update forecasts as new observations arrive. VAMoE addresses these challenges by leveraging a hybrid architecture of experts, where each expert specializes in capturing distinct subpatterns of atmospheric variables (temperature, humidity, wind speed). Moreover, the proposed method employs a variable adaptive gating mechanism to dynamically select and combine relevant experts based on the input context, enabling efficient knowledge distillation and parameter sharing. This design significantly reduces computational overhead while maintaining high forecast accuracy. Experiments on real world ERA5 dataset demonstrate that VAMoE performs comparable against SoTA models in both short term (1 days) and long term (5 days) forecasting tasks, with only about 25% of trainable parameters and 50% of the initial training data.
Related papers
- Diffusion models for probabilistic precipitation generation from atmospheric variables [1.6099193327384094]
In Earth system models (ESMs), precipitation is not resolved explicitly, but represented by parameterizations.
We present a novel approach, based on generative machine learning, which integrates a conditional diffusion model with a UNet architecture.
Unlike traditional parameterizations, our framework efficiently produces ensemble predictions, capturing uncertainties in precipitation, and does not require fine-tuning by hand.
arXiv Detail & Related papers (2025-04-01T00:21:31Z) - Masked Autoregressive Model for Weather Forecasting [7.960598061739508]
Masked Autoregressive Model for Weather Forecasting (MAM4WF)
We propose the Masked Autoregressive Model for Weather Forecasting (MAM4WF).
This model leverages masked modeling, where portions of input data are masked during training.
We evaluate MAM4WF across weather, climate forecasting, and video frame prediction datasets, demonstrating superior performance on five test datasets.
arXiv Detail & Related papers (2024-09-30T09:17:04Z) - Efficient Localized Adaptation of Neural Weather Forecasting: A Case Study in the MENA Region [62.09891513612252]
We focus on limited-area modeling and train our model specifically for localized region-level downstream tasks.
We consider the MENA region due to its unique climatic challenges, where accurate localized weather forecasting is crucial for managing water resources, agriculture and mitigating the impacts of extreme weather events.
Our study aims to validate the effectiveness of integrating parameter-efficient fine-tuning (PEFT) methodologies, specifically Low-Rank Adaptation (LoRA) and its variants, to enhance forecast accuracy, as well as training speed, computational resource utilization, and memory efficiency in weather and climate modeling for specific regions.
arXiv Detail & Related papers (2024-09-11T19:31:56Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - VarteX: Enhancing Weather Forecast through Distributed Variable Representation [5.2980803808373516]
Recent data-driven models have outperformed numerical weather prediction by utilizing deep learning in forecasting performance.
This study proposes a new variable aggregation scheme and an efficient learning framework for that challenge.
arXiv Detail & Related papers (2024-06-28T02:42:30Z) - Stock Volume Forecasting with Advanced Information by Conditional Variational Auto-Encoder [49.97673761305336]
We demonstrate the use of Conditional Variational (CVAE) to improve the forecasts of daily stock volume time series in both short and long term forecasting tasks.
CVAE generates non-linear time series as out-of-sample forecasts, which have better accuracy and closer fit of correlation to the actual data.
arXiv Detail & Related papers (2024-06-19T13:13:06Z) - Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling [55.13352174687475]
This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which generalizes weather forecasts to finer-grained temporal scales beyond training dataset.<n>Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale.<n>We also introduce a lead time-aware training framework to promote the generalization of the model at different lead times.
arXiv Detail & Related papers (2024-05-22T16:21:02Z) - Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction [1.3194391758295114]
We show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets.
Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS.
arXiv Detail & Related papers (2024-04-30T15:30:14Z) - MetaSD: A Unified Framework for Scalable Downscaling of Meteorological Variables in Diverse Situations [8.71735078449217]
This paper proposes a unified downscaling approach leveraging meta-learning.
We trained variables consisted of temperature, wind, surface pressure and total precipitation from ERA5 and GFS.
The proposed method can be extended to downscale convective precipitation, potential, energy height, humidity CFS, S2S and CMIP6 at differenttemporal scales.
arXiv Detail & Related papers (2024-04-26T06:31:44Z) - ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast [57.6987191099507]
We introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast.
We also introduce ExBooster, which captures the uncertainty in prediction outcomes by employing multiple random samples.
Our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
arXiv Detail & Related papers (2024-02-02T10:34:13Z) - FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation [67.20588721130623]
We develop an AI-based cyclic weather forecasting system, FengWu-4DVar.
FengWu-4DVar can incorporate observational data into the data-driven weather forecasting model.
Experiments on the simulated observational dataset demonstrate that FengWu-4DVar is capable of generating reasonable analysis fields.
arXiv Detail & Related papers (2023-12-16T02:07:56Z) - Federated Prompt Learning for Weather Foundation Models on Devices [37.88417074427373]
On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing.
This paper propose Federated Prompt Learning for Weather Foundation Models on Devices (FedPoD)
FedPoD enables devices to obtain highly customized models while maintaining communication efficiency.
arXiv Detail & Related papers (2023-05-23T16:59:20Z) - W-MAE: Pre-trained weather model with masked autoencoder for
multi-variable weather forecasting [7.610811907813171]
We propose a Weather model with Masked AutoEncoder pre-training for weather forecasting.
W-MAE is pre-trained in a self-supervised manner to reconstruct spatial correlations within meteorological variables.
On the temporal scale, we fine-tune the pre-trained W-MAE to predict the future states of meteorological variables.
arXiv Detail & Related papers (2023-04-18T06:25:11Z) - Hybrid Variational Autoencoder for Time Series Forecasting [12.644797358419618]
Variational autoencoders (VAE) are powerful generative models that learn the latent representations of input data as random variables.
We propose a novel hybrid variational autoencoder (HyVAE) to integrate the learning of local patterns and temporal dynamics by variational inference for time series forecasting.
arXiv Detail & Related papers (2023-03-13T12:13:28Z) - ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science.
It can be pre-trained with a self-supervised learning objective on climate datasets.
It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - SwinVRNN: A Data-Driven Ensemble Forecasting Model via Learned
Distribution Perturbation [16.540748935603723]
We propose a Swin Transformer-based Variational Recurrent Neural Network (SwinVRNN), which is a weather forecasting model combining a SwinRNN predictor with a perturbation module.
SwinVRNN surpasses operational ECMWF Integrated Forecasting System (IFS) on surface variables of 2-m temperature and 6-hourly total precipitation at all lead times up to five days.
arXiv Detail & Related papers (2022-05-26T05:11:58Z) - Data-Driven Evaluation of Training Action Space for Reinforcement
Learning [1.370633147306388]
This paper proposes a Shapley-inspired methodology for training action space categorization and ranking.
To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation.
The proposed data-driven methodology is RL to different domains, use cases, and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-08T04:53:43Z) - Reservoir Computing as a Tool for Climate Predictability Studies [0.0]
We show that Reservoir Computing provides an alternative nonlinear approach that improves on the predictive skill of the Linear-Inverse-Modeling approach.
The improved predictive skill of the RC approach over a wide range of conditions suggests that this machine-learning technique may have a use in climate predictability studies.
arXiv Detail & Related papers (2021-02-24T22:22:59Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Unsupervised Dense Shape Correspondence using Heat Kernels [50.682560435495034]
We propose an unsupervised method for learning dense correspondences between shapes using a recent deep functional map framework.
Instead of depending on ground-truth correspondences or the computationally expensive geodesic distances, we use heat kernels.
We present the results of our method on different benchmarks which have various challenges like partiality, topological noise and different connectivity.
arXiv Detail & Related papers (2020-10-23T21:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.