Learning in PINNs: Phase transition, total diffusion, and generalization
- URL: http://arxiv.org/abs/2403.18494v1
- Date: Wed, 27 Mar 2024 12:10:30 GMT
- Title: Learning in PINNs: Phase transition, total diffusion, and generalization
- Authors: Sokratis J. Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, George Em Karniadakis,
- Abstract summary: We investigate the learning dynamics of fully-connected neural networks through the lens of gradient signal-to-noise ratio (SNR)
We identify a third phase termed total diffusion"
We explore the information-induced compression phenomenon, pinpointing a significant compression of activations at the total diffusion phase.
- Score: 1.8802875123957965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the learning dynamics of fully-connected neural networks through the lens of gradient signal-to-noise ratio (SNR), examining the behavior of first-order optimizers like Adam in non-convex objectives. By interpreting the drift/diffusion phases in the information bottleneck theory, focusing on gradient homogeneity, we identify a third phase termed ``total diffusion", characterized by equilibrium in the learning rates and homogeneous gradients. This phase is marked by an abrupt SNR increase, uniform residuals across the sample space and the most rapid training convergence. We propose a residual-based re-weighting scheme to accelerate this diffusion in quadratic loss functions, enhancing generalization. We also explore the information compression phenomenon, pinpointing a significant saturation-induced compression of activations at the total diffusion phase, with deeper layers experiencing negligible information loss. Supported by experimental data on physics-informed neural networks (PINNs), which underscore the importance of gradient homogeneity due to their PDE-based sample inter-dependence, our findings suggest that recognizing phase transitions could refine ML optimization strategies for improved generalization.
Related papers
- Residual-based attention and connection to information bottleneck theory
in PINNs [0.393259574660092]
Physics-informed neural networks (PINNs) have seen a surge of interest in recent years.
We propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems.
arXiv Detail & Related papers (2023-07-01T16:29:55Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Stochastic Gradient Descent-Induced Drift of Representation in a
Two-Layer Neural Network [0.0]
Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood.
Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network.
arXiv Detail & Related papers (2023-02-06T04:56:05Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Improved Overparametrization Bounds for Global Convergence of Stochastic
Gradient Descent for Shallow Neural Networks [1.14219428942199]
We study the overparametrization bounds required for the global convergence of gradient descent algorithm for a class of one hidden layer feed-forward neural networks.
arXiv Detail & Related papers (2022-01-28T11:30:06Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of
Stochasticity [24.428843425522107]
We study the dynamics of gradient descent over diagonal linear networks through its continuous time version, namely gradient flow.
We show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias.
arXiv Detail & Related papers (2021-06-17T14:16:04Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - Phases of learning dynamics in artificial neural networks: with or
without mislabeled data [3.3576886095389296]
We study dynamics of gradient descent that drives learning in neural networks.
Without mislabeled data, we find that the SGD learning dynamics transitions from a fast learning phase to a slow exploration phase.
We find that individual sample losses for the two datasets are most separated during phase II.
arXiv Detail & Related papers (2021-01-16T19:44:27Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.