Metric Flows with Neural Networks
- URL: http://arxiv.org/abs/2310.19870v2
- Date: Fri, 18 Oct 2024 20:21:22 GMT
- Title: Metric Flows with Neural Networks
- Authors: James Halverson, Fabian Ruehle,
- Abstract summary: We develop a theory of flows in the space of Riemannian metrics induced by neural network gradient descent.
This is motivated in part by advances in approximating Calabi-Yau metrics with neural networks.
We demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning.
- Score: 0.0
- License:
- Abstract: We develop a general theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincar\'e conjecture. We demonstrate that such fixed kernel regimes lead to poor learning of numerical Calabi-Yau metrics, as is expected since the associated neural networks do not learn features. Conversely, we demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning. Our theory of neural network metric flows therefore explains why neural networks are better at learning Calabi-Yau metrics than fixed kernel methods, such as the Ricci flow.
Related papers
- Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency.
Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z) - Dynamically Stable Poincar\'e Embeddings for Neural Manifolds [10.76554740227876]
We prove that if initial metrics have an $L2$-norm perturbation which deviates from the Hyperbolic metric on the Poincar'e ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric.
Specifically, the role of the Ricci flow is to serve as naturally evolving to the stable Poincar'e ball that will then be mapped back to the Euclidean space.
arXiv Detail & Related papers (2021-12-21T13:09:08Z) - Learn Like The Pro: Norms from Theory to Size Neural Computation [3.848947060636351]
We investigate how dynamical systems with nonlinearities can inform the design of neural systems that seek to emulate them.
We propose a Learnability metric and quantify its associated features to the near-equilibrium behavior of learning dynamics.
It reveals exact sizing for a class of neural networks with multiplicative nodes that mimic continuous- or discrete-time dynamics.
arXiv Detail & Related papers (2021-06-21T20:58:27Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Convergence dynamics of Generative Adversarial Networks: the dual metric
flows [0.0]
We investigate convergence in the Generative Adversarial Networks used in machine learning.
We study the limit of small learning rate, and show that, similar to single network training, the GAN learning dynamics tend to vanish to some limit dynamics.
arXiv Detail & Related papers (2020-12-18T18:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.