Fourier analysis of the physics of transfer learning for data-driven subgrid-scale models of ocean turbulence
- URL: http://arxiv.org/abs/2504.15487v1
- Date: Mon, 21 Apr 2025 23:42:19 GMT
- Title: Fourier analysis of the physics of transfer learning for data-driven subgrid-scale models of ocean turbulence
- Authors: Moein Darman, Pedram Hassanzadeh, Laure Zanna, Ashesh Chattopadhyay,
- Abstract summary: Transfer learning (TL) is a powerful tool for enhancing the performance of neural networks (NNs) in applications such as weather and climate prediction and turbulence modeling.<n>In this study, we employ a 9-layer convolutional NN to predict the subgrid forcing in a two-layer ocean quasi-geostrophic system.<n>By re-training only one layer with data from the target system, this underestimation is corrected, enabling the NN to produce predictions that match the target spectra.
- Score: 1.7961156132737741
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning (TL) is a powerful tool for enhancing the performance of neural networks (NNs) in applications such as weather and climate prediction and turbulence modeling. TL enables models to generalize to out-of-distribution data with minimal training data from the new system. In this study, we employ a 9-layer convolutional NN to predict the subgrid forcing in a two-layer ocean quasi-geostrophic system and examine which metrics best describe its performance and generalizability to unseen dynamical regimes. Fourier analysis of the NN kernels reveals that they learn low-pass, Gabor, and high-pass filters, regardless of whether the training data are isotropic or anisotropic. By analyzing the activation spectra, we identify why NNs fail to generalize without TL and how TL can overcome these limitations: the learned weights and biases from one dataset underestimate the out-of-distribution sample spectra as they pass through the network, leading to an underestimation of output spectra. By re-training only one layer with data from the target system, this underestimation is corrected, enabling the NN to produce predictions that match the target spectra. These findings are broadly applicable to data-driven parameterization of dynamical systems.
Related papers
- Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Deep Recurrent Stochastic Configuration Networks for Modelling Nonlinear Dynamic Systems [3.8719670789415925]
This paper proposes a novel deep reservoir computing framework, termed deep recurrent configuration network (DeepRSCN)
DeepRSCNs are incrementally constructed, with all reservoir nodes directly linked to the final output.
Given a set of training samples, DeepRSCNs can quickly generate learning representations, which consist of random basis functions with cascaded input readout weights.
arXiv Detail & Related papers (2024-10-28T10:33:15Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Interpretable A-posteriori Error Indication for Graph Neural Network Surrogate Models [0.0]
This work introduces an interpretability enhancement procedure for graph neural networks (GNNs)
The end result is an interpretable GNN model that isolates regions in physical space, corresponding to sub-graphs, that are intrinsically linked to the forecasting task.
The interpretable GNNs can also be used to identify, during inference, graph nodes that correspond to a majority of the anticipated forecasting error.
arXiv Detail & Related papers (2023-11-13T18:37:07Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Explaining the physics of transfer learning a data-driven subgrid-scale
closure to a different turbulent flow [0.0]
Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs)
In TL, selected layers of a NN, already trained for a base system, are re-trained using a small dataset from a target system.
We present novel analyses and a new framework to address (1)-(2) for a broad range of multi-scale, nonlinear systems.
arXiv Detail & Related papers (2022-06-07T11:42:29Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Deep learning and high harmonic generation [0.0]
We explore the utility of various deep neural networks (NNs) when applied to high harmonic generation (HHG) scenarios.
First, we train the NNs to predict the time-dependent dipole and spectra of HHG emission from reduced-dimensionality models of di- and triatomic systems.
We then demonstrate that transfer learning can be applied to our networks to expand the range of applicability of the networks.
arXiv Detail & Related papers (2020-12-18T16:13:17Z) - Transfer Learning with Convolutional Networks for Atmospheric Parameter
Retrieval [14.131127382785973]
The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series provides important measurements for Numerical Weather Prediction (NWP)
Retrieving accurate atmospheric parameters from the raw data provided by IASI is a large challenge, but necessary in order to use the data in NWP models.
We show how features extracted from the IASI data by a CNN trained to predict a physical variable can be used as inputs to another statistical method designed to predict a different physical variable at low altitude.
arXiv Detail & Related papers (2020-12-09T09:28:42Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Learning across label confidence distributions using Filtered Transfer
Learning [0.44040106718326594]
We propose a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets.
We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks.
We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range.
arXiv Detail & Related papers (2020-06-03T21:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.