Learning Unitaries by Gradient Descent
- URL: http://arxiv.org/abs/2001.11897v3
- Date: Tue, 18 Feb 2020 21:05:43 GMT
- Title: Learning Unitaries by Gradient Descent
- Authors: Bobak Toussi Kiani, Seth Lloyd, Reevu Maity
- Abstract summary: We learn unitary transformations in $Ud)$ via gradient descent on time parameters of alternating sequences.
With less gradient$ parameters, gradient descent converges to a sub-optimal solution, whereas with more than $d2$ parameters, gradient descent converges to an optimal solution.
- Score: 12.354076490479516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the hardness of learning unitary transformations in $U(d)$ via
gradient descent on time parameters of alternating operator sequences. We
provide numerical evidence that, despite the non-convex nature of the loss
landscape, gradient descent always converges to the target unitary when the
sequence contains $d^2$ or more parameters. Rates of convergence indicate a
"computational phase transition." With less than $d^2$ parameters, gradient
descent converges to a sub-optimal solution, whereas with more than $d^2$
parameters, gradient descent converges exponentially to an optimal solution.
Related papers
- Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency [47.8739414267201]
We consider gradient descent (GD) with a constant stepsize applied to logistic regression with linearly separable data.
We show that GD exits this initial oscillatory phase rapidly -- in $mathcalO(eta)$ steps -- and subsequently achieves an $tildemathcalO (1 / (eta t) )$ convergence rate.
Our results imply that, given a budget of $T$ steps, GD can achieve an accelerated loss of $tildemathcalO (1/T2)$ with an aggressive stepsize
arXiv Detail & Related papers (2024-02-24T23:10:28Z) - Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning [1.4050802766699084]
We consider the scenario of supervised learning in Deep Learning (DL) networks.
We choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network.
arXiv Detail & Related papers (2023-11-27T02:12:02Z) - Over-Parameterization Exponentially Slows Down Gradient Descent for
Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate.
These two bounds jointly give an exact characterization of the convergence rate.
We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Convergence of gradient descent for learning linear neural networks [2.209921757303168]
We show that gradient descent converges to a critical point of the loss function, i.e., the square loss in this article.
In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank.
arXiv Detail & Related papers (2021-08-04T13:10:30Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Convergence of Gradient Algorithms for Nonconvex $C^{1+\alpha}$ Cost
Functions [2.538209532048867]
We show a byproduct that a gradient converges and provide an explicit upper bound on the convergence rate.
This allows us to prove the almost sure convergence by Doobmartin's supergale convergence theorem.
arXiv Detail & Related papers (2020-12-01T16:48:59Z) - Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions [12.096252285460814]
We show that the conditional gradient method requires $mathcalO(epsilon-1.5)$ calls to the gradient oracle to find an $epsilon$-optimal solution.
By including a gradient sliding step, we show that the number of calls reduces to $mathcalO(epsilon-1.5)$.
arXiv Detail & Related papers (2020-06-15T06:51:39Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.