Related papers: Wilsonian Renormalization of Neural Network Gaussian Processes

Wilsonian Renormalization of Neural Network Gaussian Processes

URL: http://arxiv.org/abs/2405.06008v2
Date: Wed, 14 Aug 2024 06:06:56 GMT
Title: Wilsonian Renormalization of Neural Network Gaussian Processes
Authors: Jessica N. Howard, Ro Jefferson, Anindita Maiti, Zohar Ringel,
Abstract summary: We demonstrate a practical approach to performing Wilsonian RG in the context of Gaussian Process (GP) Regression. We systematically integrate out the unlearnable modes of the GP kernel, thereby obtaining an RG flow of the GP in which the data sets the IR scale. This approach goes beyond structural analogies between RG and neural networks by providing a natural connection between RG flow and learnable vs. unlearnable modes.
Score: 1.8749305679160366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Separating relevant and irrelevant information is key to any modeling process or scientific inquiry. Theoretical physics offers a powerful tool for achieving this in the form of the renormalization group (RG). Here we demonstrate a practical approach to performing Wilsonian RG in the context of Gaussian Process (GP) Regression. We systematically integrate out the unlearnable modes of the GP kernel, thereby obtaining an RG flow of the GP in which the data sets the IR scale. In simple cases, this results in a universal flow of the ridge parameter, which becomes input-dependent in the richer scenario in which non-Gaussianities are included. In addition to being analytically tractable, this approach goes beyond structural analogies between RG and neural networks by providing a natural connection between RG flow and learnable vs. unlearnable modes. Studying such flows may improve our understanding of feature learning in deep neural networks, and enable us to identify potential universality classes in these models.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity [0.0]
In some settings neural networks exhibit a phenomenon known as textitgrokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as GP classification, GP regression, linear regression and Bayesian neural networks.
arXiv Detail & Related papers (2023-10-26T08:47:42Z)
Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains [7.936841911281107]
We propose cvHM, a general inference framework for latent GP models leveraging Hida-Mat'ern kernels and conjugate variational inference (CVI) We are able to perform variational inference of latent neural trajectories with linear time complexity for arbitrary likelihoods.
arXiv Detail & Related papers (2023-06-01T16:31:36Z)
Renormalized Graph Neural Networks [4.200261123369236]
Graph Neural Networks (GNNs) have become essential for studying complex data, particularly when represented as graphs. This paper proposes a new approach that applies renormalization group theory to improve GNNs' performance on graph-related tasks.
arXiv Detail & Related papers (2023-06-01T14:16:43Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Integrated Gradient attribution for Gaussian Processes with non-Gaussian likelihoods [0.0]
Gaussian Process (GP) models are a powerful tool in probabilistic machine learning. The need for making a model's decision making process transparent and explainable is now a major focus of research. This paper proposes an approach for applying Integrated Gradient-based explainability to non-Gaussian GP models.
arXiv Detail & Related papers (2022-05-25T14:11:00Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$. We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z)
Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them. NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Scalable Partial Explainability in Neural Networks via Flexible Activation Functions [13.71739091287644]
High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms. Current state-of-the-art NN interpretation methods focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself. In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology.
arXiv Detail & Related papers (2020-06-10T20:30:15Z)
Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes [144.6048446370369]
Graph convolutional neural networks(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification. We propose a GP regression model via GCNs(GPGC) for graph-based semi-supervised learning. We conduct extensive experiments to evaluate GPGC and demonstrate that it outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-02-26T10:02:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.