Related papers: Explaining the physics of transfer learning a data-driven subgrid-scale closure to a different turbulent flow

Explaining the physics of transfer learning a data-driven subgrid-scale closure to a different turbulent flow

URL: http://arxiv.org/abs/2206.03198v1
Date: Tue, 7 Jun 2022 11:42:29 GMT
Title: Explaining the physics of transfer learning a data-driven subgrid-scale closure to a different turbulent flow
Authors: Adam Subel, Yifei Guan, Ashesh Chattopadhyay, Pedram Hassanzadeh
Abstract summary: Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs) In TL, selected layers of a NN, already trained for a base system, are re-trained using a small dataset from a target system. We present novel analyses and a new framework to address (1)-(2) for a broad range of multi-scale, nonlinear systems.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs), such as weather/climate prediction and turbulence modeling. TL enables out-of-distribution generalization (e.g., extrapolation in parameters) and effective blending of disparate training sets (e.g., simulations and observations). In TL, selected layers of a NN, already trained for a base system, are re-trained using a small dataset from a target system. For effective TL, we need to know 1) what are the best layers to re-train? and 2) what physics are learned during TL? Here, we present novel analyses and a new framework to address (1)-(2) for a broad range of multi-scale, nonlinear systems. Our approach combines spectral analyses of the systems' data with spectral analyses of convolutional NN's activations and kernels, explaining the inner-workings of TL in terms of the system's nonlinear physics. Using subgrid-scale modeling of several setups of 2D turbulence as test cases, we show that the learned kernels are combinations of low-, band-, and high-pass filters, and that TL learns new filters whose nature is consistent with the spectral differences of base and target systems. We also find the shallowest layers are the best to re-train in these cases, which is against the common wisdom guiding TL in machine learning literature. Our framework identifies the best layer(s) to re-train beforehand, based on physics and NN theory. Together, these analyses explain the physics learned in TL and provide a framework to guide TL for wide-ranging applications in science and engineering, such as climate change modeling.

Related papers

Fourier analysis of the physics of transfer learning for data-driven subgrid-scale models of ocean turbulence [1.7961156132737741]
Transfer learning (TL) is a powerful tool for enhancing the performance of neural networks (NNs) in applications such as weather and climate prediction and turbulence modeling. In this study, we employ a 9-layer convolutional NN to predict the subgrid forcing in a two-layer ocean quasi-geostrophic system. By re-training only one layer with data from the target system, this underestimation is corrected, enabling the NN to produce predictions that match the target spectra.
arXiv Detail & Related papers (2025-04-21T23:42:19Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics [11.44947569206928]
Deep neural networks (DNNs) maximize information extraction by rationally allocating the proportion of neurons in different modes across deep layers. This framework provides a unified explanation for fundamental DNN behaviors such as the "flat minima effect," "grokking," and double descent phenomena.
arXiv Detail & Related papers (2025-01-05T04:23:21Z)
Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels [56.69755544814834]
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers.
arXiv Detail & Related papers (2024-01-25T19:05:53Z)
Mobile Traffic Prediction at the Edge through Distributed and Transfer Learning [2.687861184973893]
The research in this topic concentrated on making predictions in a centralized fashion, by collecting data from the different network elements. We propose a novel prediction framework based on edge computing which uses datasets obtained on the edge through a large measurement campaign.
arXiv Detail & Related papers (2023-10-22T23:48:13Z)
Accurate deep learning sub-grid scale models for large eddy simulations [0.0]
We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms. Explicit filtering of data from direct simulations of canonical channel flow at two friction Reynolds numbers provided accurate data for training and testing.
arXiv Detail & Related papers (2023-07-19T15:30:06Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Human Trajectory Prediction via Neural Social Physics [63.62824628085961]
Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored. We propose a new method combining both methodologies based on a new Neural Differential Equation model. Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters.
arXiv Detail & Related papers (2022-07-21T12:11:18Z)
Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks. To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings. We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z)
A Meta-Learning Approach to the Optimal Power Flow Problem Under Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach. The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z)
Transfer Learning with Convolutional Networks for Atmospheric Parameter Retrieval [14.131127382785973]
The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series provides important measurements for Numerical Weather Prediction (NWP) Retrieving accurate atmospheric parameters from the raw data provided by IASI is a large challenge, but necessary in order to use the data in NWP models. We show how features extracted from the IASI data by a CNN trained to predict a physical variable can be used as inputs to another statistical method designed to predict a different physical variable at low altitude.
arXiv Detail & Related papers (2020-12-09T09:28:42Z)
Physics-Based Deep Learning for Fiber-Optic Communication Systems [10.630021520220653]
We propose a new machine-learning approach for fiber-optic communication systems governed by the nonlinear Schr"odinger equation (NLSE) Our main observation is that the popular split-step method (SSM) for numerically solving the NLSE has essentially the same functional form as a deep multi-layer neural network. We exploit this connection by parameterizing the SSM and viewing the linear steps as general linear functions, similar to the weight matrices in a neural network.
arXiv Detail & Related papers (2020-10-27T12:55:23Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.