Quantum Lazy Training
- URL: http://arxiv.org/abs/2202.08232v7
- Date: Fri, 21 Apr 2023 02:43:33 GMT
- Title: Quantum Lazy Training
- Authors: Erfan Abedi, Salman Beigi, Leila Taghavi
- Abstract summary: We show that the training of geometrically local parameterized quantum circuits enters the lazy regime for large numbers of qubits.
More precisely, we prove bounds on the rate of changes of the parameters of such a geometrically local parameterized quantum circuit in the training process.
- Score: 2.492300648514128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the training of over-parameterized model functions via gradient descent,
sometimes the parameters do not change significantly and remain close to their
initial values. This phenomenon is called lazy training, and motivates
consideration of the linear approximation of the model function around the
initial parameters. In the lazy regime, this linear approximation imitates the
behavior of the parameterized function whose associated kernel, called the
tangent kernel, specifies the training performance of the model. Lazy training
is known to occur in the case of (classical) neural networks with large widths.
In this paper, we show that the training of geometrically local parameterized
quantum circuits enters the lazy regime for large numbers of qubits. More
precisely, we prove bounds on the rate of changes of the parameters of such a
geometrically local parameterized quantum circuit in the training process, and
on the precision of the linear approximation of the associated quantum model
function; both of these bounds tend to zero as the number of qubits grows. We
support our analytic results with numerical simulations.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Identifying overparameterization in Quantum Circuit Born Machines [1.7259898169307613]
We study the onset of over parameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient methods.
Our results indicate that fully understanding the trainability of these models remains an open question.
arXiv Detail & Related papers (2023-07-06T21:05:22Z) - Backpropagation scaling in parameterised quantum circuits [0.0]
We introduce circuits that are not known to be classically simulable and admit gradient estimation with significantly fewer circuits.
Specifically, these circuits allow for fast estimation of the gradient, higher order partial derivatives and the Fisher information matrix.
In a toy classification problem on 16 qubits, such circuits show competitive performance with other methods, while reducing the training cost by about two orders of magnitude.
arXiv Detail & Related papers (2023-06-26T18:00:09Z) - Parsimonious Optimisation of Parameters in Variational Quantum Circuits [1.303764728768944]
We propose a novel Quantum-Gradient Sampling that requires the execution of at most two circuits per iteration to update the optimisable parameters.
Our proposed method achieves similar convergence rates to classical gradient descent, and empirically outperforms gradient coordinate descent, and SPSA.
arXiv Detail & Related papers (2023-06-20T18:50:18Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Deep Quantum Neural Networks are Gaussian Process [0.0]
We present a framework to examine the impact of finite width in the closed-form relationship using a $ 1/d$ expansion.
We elucidate the relationship between GP and its parameter space equivalent, characterized by the Quantum Neural Tangent Kernels (QNTK)
arXiv Detail & Related papers (2023-05-22T03:07:43Z) - Analytic theory for the dynamics of wide quantum neural networks [7.636414695095235]
We study the dynamics of gradient descent for the training error of a class of variational quantum machine learning models.
For random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system.
arXiv Detail & Related papers (2022-03-30T23:24:06Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.