A theory of data variability in Neural Network Bayesian inference
- URL: http://arxiv.org/abs/2307.16695v2
- Date: Thu, 9 Nov 2023 13:08:22 GMT
- Title: A theory of data variability in Neural Network Bayesian inference
- Authors: Javed Lindner, David Dahmen, Michael Kr\"amer and Moritz Helias
- Abstract summary: We provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks.
We derive the generalization properties from the statistical properties of the input.
We show that data variability leads to a non-Gaussian action reminiscent of a ($varphi3+varphi4$)-theory.
- Score: 0.70224924046445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian inference and kernel methods are well established in machine
learning. The neural network Gaussian process in particular provides a concept
to investigate neural networks in the limit of infinitely wide hidden layers by
using kernel and inference methods. Here we build upon this limit and provide a
field-theoretic formalism which covers the generalization properties of
infinitely wide networks. We systematically compute generalization properties
of linear, non-linear, and deep non-linear networks for kernel matrices with
heterogeneous entries. In contrast to currently employed spectral methods we
derive the generalization properties from the statistical properties of the
input, elucidating the interplay of input dimensionality, size of the training
data set, and variability of the data. We show that data variability leads to a
non-Gaussian action reminiscent of a ($\varphi^3+\varphi^4$)-theory. Using our
formalism on a synthetic task and on MNIST we obtain a homogeneous kernel
matrix approximation for the learning curve as well as corrections due to data
variability which allow the estimation of the generalization properties and
exact results for the bounds of the learning curves in the case of infinitely
many training data points.
Related papers
- Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Neural Tangent Kernels Motivate Graph Neural Networks with
Cross-Covariance Graphs [94.44374472696272]
We investigate NTKs and alignment in the context of graph neural networks (GNNs)
Our results establish the theoretical guarantees on the optimality of the alignment for a two-layer GNN.
These guarantees are characterized by the graph shift operator being a function of the cross-covariance between the input and the output data.
arXiv Detail & Related papers (2023-10-16T19:54:21Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Deep Networks Provably Classify Data on Curves [12.309532551321334]
We study a model problem that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere.
We prove that when (i) the network depth is large to certain properties that set the difficulty of the problem and (ii) the network width and number of samples is intrinsic in the relative depth, randomly-d gradient descent quickly learns to correctly classify all points on the two curves with high probability.
arXiv Detail & Related papers (2021-07-29T20:40:04Z) - Universality and Optimality of Structured Deep Kernel Networks [0.0]
Kernel based methods yield approximation models that are flexible, efficient and powerful.
Recent success of machine learning methods has been driven by deep neural networks (NNs)
In this paper, we show that the use of special types of kernels yield models reminiscent of neural networks.
arXiv Detail & Related papers (2021-05-15T14:10:35Z) - Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning.
Recent observations in deep neural networks contradict conventional wisdom from classical statistics.
We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural
Networks [17.188280334580195]
We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples.
Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Kernel Tangent (NTK)
We verify our theory with simulations on synthetic data and MNIST dataset.
arXiv Detail & Related papers (2020-02-07T00:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.