Quantitative convergence of trained quantum neural networks to a Gaussian process
- URL: http://arxiv.org/abs/2412.03182v1
- Date: Wed, 04 Dec 2024 10:09:56 GMT
- Title: Quantitative convergence of trained quantum neural networks to a Gaussian process
- Authors: Anderson Melchor Hernandez, Filippo Girardi, Davide Pastorello, Giacomo De Palma,
- Abstract summary: We study quantum neural networks where the generated function is the expectation value of the sum of single-qubit observables across all qubits.
It is proven that the probability distributions of such functions converge in distribution to a Gaussian process in the limit of infinite width for both untrained networks with randomly generated parameters and trained networks.
- Score: 3.495246564946556
- License:
- Abstract: We study quantum neural networks where the generated function is the expectation value of the sum of single-qubit observables across all qubits. In [Girardi \emph{et al.}, arXiv:2402.08726], it is proven that the probability distributions of such generated functions converge in distribution to a Gaussian process in the limit of infinite width for both untrained networks with randomly initialized parameters and trained networks. In this paper, we provide a quantitative proof of this convergence in terms of the Wasserstein distance of order $1$. First, we establish an upper bound on the distance between the probability distribution of the function generated by any untrained network with finite width and the Gaussian process with the same covariance. This proof utilizes Stein's method to estimate the Wasserstein distance of order $1$. Next, we analyze the training dynamics of the network via gradient flow, proving an upper bound on the distance between the probability distribution of the function generated by the trained network and the corresponding Gaussian process. This proof is based on a quantitative upper bound on the maximum variation of a parameter during training. This bound implies that for sufficiently large widths, training occurs in the lazy regime, \emph{i.e.}, each parameter changes only by a small amount. While the convergence result of [Girardi \emph{et al.}, arXiv:2402.08726] holds at a fixed training time, our upper bounds are uniform in time and hold even as $t \to \infty$.
Related papers
- Random ReLU Neural Networks as Non-Gaussian Processes [20.607307985674428]
We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
arXiv Detail & Related papers (2024-05-16T16:28:11Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Trained quantum neural networks are Gaussian processes [5.439020425819001]
We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of width.
We analytically characterize the training of the network via gradient descent with square loss on supervised learning problems.
We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set.
arXiv Detail & Related papers (2024-02-13T19:00:08Z) - Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes [1.0878040851638]
We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
arXiv Detail & Related papers (2023-12-18T22:29:40Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.