Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers
- URL: http://arxiv.org/abs/2412.16763v1
- Date: Sat, 21 Dec 2024 20:21:52 GMT
- Title: Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers
- Authors: Shuochen Wang, Nishant Yadav, Auroop R. Ganguly,
- Abstract summary: We propose a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization.
Our results demonstrate that the proposed model successfully captures the complex non-linear dependencies in the sub-grid scale variables.
- Score: 6.622831012413507
- License:
- Abstract: One of the major sources of uncertainty in the current generation of Global Climate Models (GCMs) is the representation of sub-grid scale physical processes. Over the years, a series of deep-learning-based parameterization schemes have been developed and tested on both idealized and real-geography GCMs. However, datasets on which previous deep-learning models were trained either contain limited variables or have low spatial-temporal coverage, which can not fully simulate the parameterization process. Additionally, these schemes rely on classical architectures while the latest attention mechanism used in Transformer models remains unexplored in this field. In this paper, we propose Paraformer, a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization. Our results demonstrate that the proposed model successfully captures the complex non-linear dependencies in the sub-grid scale variables and outperforms classical deep-learning architectures. This work highlights the applicability of the attenuation mechanism in this field and provides valuable insights for developing future deep-learning-based climate parameterization schemes.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - Machine Learning Global Simulation of Nonlocal Gravity Wave Propagation [1.3108798582758452]
We present the first-ever global simulation of atmospheric mesoscale processes using machine learning (ML) models trained on the WINDSET dataset.
Using an Attention U-Net-based architecture trained on globally resolved GW momentum, we illustrate the importance and effectiveness of global nonlocality.
arXiv Detail & Related papers (2024-06-20T22:57:38Z) - Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios.
Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches.
We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS)
Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z) - Automatic Parameterization for Aerodynamic Shape Optimization via Deep
Geometric Learning [60.69217130006758]
We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization.
Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns.
We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.
arXiv Detail & Related papers (2023-05-03T13:45:40Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - A posteriori learning for quasi-geostrophic turbulence parametrization [5.333802479607541]
State-of-the-art strategies address the problem as a supervised learning task.
We show how subgrid parametrizations can alternatively be trained end-to-end in order to meet $textita posteriori$ criteria.
arXiv Detail & Related papers (2022-04-08T08:23:06Z) - Climate-Invariant Machine Learning [0.8831201550856289]
Current climate models require representations of processes that occur at scales smaller than model grid size.
Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on.
We propose a new framework - termed "climate-invariant" ML - incorporating knowledge of climate processes into ML algorithms.
arXiv Detail & Related papers (2021-12-14T07:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.