A Neural Network Perturbation Theory Based on the Born Series
- URL: http://arxiv.org/abs/2009.03192v2
- Date: Thu, 27 May 2021 08:09:47 GMT
- Title: A Neural Network Perturbation Theory Based on the Born Series
- Authors: Bastian Kaspschak and Ulf-G. Mei{\ss}ner
- Abstract summary: Taylor coefficients of deep neural networks (DNNs) still appear mainly in the light of interpretability studies.
This gap motivates a general formulation of neural network (NN) Taylor expansions.
We show that NNs adapt their derivatives mainly to the leading order of the target function's Taylor expansion.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning using the eponymous deep neural networks (DNNs) has become an
attractive approach towards various data-based problems of theoretical physics
in the past decade. There has been a clear trend to deeper architectures
containing increasingly more powerful and involved layers. Contrarily, Taylor
coefficients of DNNs still appear mainly in the light of interpretability
studies, where they are computed at most to first order. However, especially in
theoretical physics numerous problems benefit from accessing higher orders, as
well. This gap motivates a general formulation of neural network (NN) Taylor
expansions. Restricting our analysis to multilayer perceptrons (MLPs) and
introducing quantities we refer to as propagators and vertices, both depending
on the MLP's weights and biases, we establish a graph-theoretical approach.
Similarly to Feynman rules in quantum field theories, we can systematically
assign diagrams containing propagators and vertices to the corresponding
partial derivative. Examining this approach for S-wave scattering lengths of
shallow potentials, we observe NNs to adapt their derivatives mainly to the
leading order of the target function's Taylor expansion. To circumvent this
problem, we propose an iterative NN perturbation theory. During each iteration
we eliminate the leading order, such that the next-to-leading order can be
faithfully learned during the subsequent iteration. After performing two
iterations, we find that the first- and second-order Born terms are correctly
adapted during the respective iterations. Finally, we combine both results to
find a proxy that acts as a machine-learned second-order Born approximation.
Related papers
- Spiking Graph Neural Network on Riemannian Manifolds [51.15400848660023]
Graph neural networks (GNNs) have become the dominant solution for learning on graphs.
Existing spiking GNNs consider graphs in Euclidean space, ignoring the structural geometry.
We present a Manifold-valued Spiking GNN (MSG)
MSG achieves superior performance to previous spiking GNNs and energy efficiency to conventional GNNs.
arXiv Detail & Related papers (2024-10-23T15:09:02Z) - Understanding the dynamics of the frequency bias in neural networks [0.0]
Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process.
We develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN.
We empirically show that the same principle extends to multi-layer NNs.
arXiv Detail & Related papers (2024-05-23T18:09:16Z) - Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features [68.3512123520931]
We investigate the dynamics of a deep neural network (DNN) learning interactions.
In this paper, we discover the DNN learns interactions in two phases.
The first phase mainly penalizes interactions of medium and high orders, and the second phase mainly learns interactions of gradually increasing orders.
arXiv Detail & Related papers (2024-05-16T17:13:25Z) - Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.
This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z) - SEGNO: Generalizing Equivariant Graph Neural Networks with Physical
Inductive Biases [66.61789780666727]
We show how the second-order continuity can be incorporated into GNNs while maintaining the equivariant property.
We also offer theoretical insights into SEGNO, highlighting that it can learn a unique trajectory between adjacent states.
Our model yields a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-08-25T07:15:58Z) - The Implicit Bias of Gradient Descent on Generalized Gated Linear
Networks [3.3946853660795893]
We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks (GLNs)
We show how architectural constraints and the implicit bias of gradient descent affect performance.
By making the inductive bias explicit, our framework is poised to inform the development of more efficient, biologically plausible, and robust learning algorithms.
arXiv Detail & Related papers (2022-02-05T22:37:39Z) - Unified Field Theory for Deep and Recurrent Neural Networks [56.735884560668985]
We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
arXiv Detail & Related papers (2021-12-10T15:06:11Z) - The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Optimization and Generalization Analysis of Transduction through
Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing.
Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem.
We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.