Global Convergence Rate of Deep Equilibrium Models with General
Activations
- URL: http://arxiv.org/abs/2302.05797v3
- Date: Fri, 1 Mar 2024 18:23:01 GMT
- Title: Global Convergence Rate of Deep Equilibrium Models with General
Activations
- Authors: Lan V. Truong
- Abstract summary: This paper shows that the fact still holds for DEQs with any generally bounded activation with bounded first and second derivatives.
Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging.
- Score: 18.601449856300984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a recent paper, Ling et al. investigated the over-parametrized Deep
Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient
descent converges to a globally optimal solution for the quadratic loss
function at a linear convergence rate. This paper shows that this fact still
holds for DEQs with any generally bounded activation with bounded first and
second derivatives. Since the new activation function is generally
non-homogeneous, bounding the least eigenvalue of the Gram matrix of the
equilibrium point is particularly challenging. To accomplish this task, we must
create a novel population Gram matrix and develop a new form of dual activation
with Hermite polynomial expansion.
Related papers
- Convergence of Kinetic Langevin Monte Carlo on Lie groups [21.76159063788814]
We propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics.
This is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry.
arXiv Detail & Related papers (2024-03-18T17:50:20Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Global Convergence of Over-parameterized Deep Equilibrium Models [52.65330015267245]
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection.
Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation.
We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
arXiv Detail & Related papers (2022-05-27T08:00:13Z) - Reduced density matrix functional theory from an ab initio
seniority-zero wave function: Exact and approximate formulations along
adiabatic connection paths [0.0]
We propose an alternative formulation of reduced density-matrix functional theory (RDMFT)
The exact natural orbitals and their occupancies are determined self-consistently from an effective seniority-zero calculation.
This information is expected to serve as a guide in the future design of higher-seniority density-matrix functional approximations.
arXiv Detail & Related papers (2022-04-01T21:27:25Z) - Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z) - Optimization Induced Equilibrium Networks [76.05825996887573]
Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently.
We show that deep OptEq outperforms previous implicit models even with fewer parameters.
arXiv Detail & Related papers (2021-05-27T15:17:41Z) - A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z) - Global Convergence of Second-order Dynamics in Two-layer Neural Networks [10.415177082023389]
Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit.
We show that the answer is positive for the heavy ball method.
While our results are functional in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks.
arXiv Detail & Related papers (2020-07-14T07:01:57Z) - Competitive Mirror Descent [67.31015611281225]
Constrained competitive optimization involves multiple agents trying to minimize conflicting objectives, subject to constraints.
We propose competitive mirror descent (CMD): a general method for solving such problems based on first order information.
As a special case we obtain a novel competitive multiplicative weights algorithm for problems on the positive cone.
arXiv Detail & Related papers (2020-06-17T22:11:35Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.