Global Convergence of Over-parameterized Deep Equilibrium Models
- URL: http://arxiv.org/abs/2205.13814v2
- Date: Wed, 29 Mar 2023 03:56:52 GMT
- Title: Global Convergence of Over-parameterized Deep Equilibrium Models
- Authors: Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, Zhouchen Lin
- Abstract summary: A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection.
Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation.
We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
- Score: 52.65330015267245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A deep equilibrium model (DEQ) is implicitly defined through an equilibrium
point of an infinite-depth weight-tied model with an input-injection. Instead
of infinite computations, it solves an equilibrium point directly with
root-finding and computes gradients with implicit differentiation. The training
dynamics of over-parameterized DEQs are investigated in this study. By
supposing a condition on the initial equilibrium point, we show that the unique
equilibrium point always exists during the training process, and the gradient
descent is proved to converge to a globally optimal solution at a linear
convergence rate for the quadratic loss function. In order to show that the
required initial condition is satisfied via mild over-parameterization, we
perform a fine-grained analysis on random DEQs. We propose a novel
probabilistic framework to overcome the technical difficulty in the
non-asymptotic analysis of infinite-depth weight-tied models.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Positive concave deep equilibrium models [7.148312060227714]
Deep equilibrium (DEQ) models are a memory efficient alternative to standard neural networks.
We introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models.
Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant.
arXiv Detail & Related papers (2024-02-06T14:24:29Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers [32.57938108395521]
A class of mechanistic models, Linear partial differential equations, are used to describe physical processes such as heat transfer, electromagnetism, and wave propagation.
specialized numerical methods based on discretization are used to solve PDEs.
By ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error.
arXiv Detail & Related papers (2022-12-23T17:02:59Z) - Physics-Informed Neural Network Method for Parabolic Differential
Equations with Sharply Perturbed Initial Conditions [68.8204255655161]
We develop a physics-informed neural network (PINN) model for parabolic problems with a sharply perturbed initial condition.
Localized large gradients in the ADE solution make the (common in PINN) Latin hypercube sampling of the equation's residual highly inefficient.
We propose criteria for weights in the loss function that produce a more accurate PINN solution than those obtained with the weights selected via other methods.
arXiv Detail & Related papers (2022-08-18T05:00:24Z) - Single Trajectory Nonparametric Learning of Nonlinear Dynamics [8.438421942654292]
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE)
We leverage recently developed information-theoretic methods to establish the optimality of the LSE for non hypotheses classes.
We specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS)
arXiv Detail & Related papers (2022-02-16T19:38:54Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Optimization Induced Equilibrium Networks [76.05825996887573]
Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently.
We show that deep OptEq outperforms previous implicit models even with fewer parameters.
arXiv Detail & Related papers (2021-05-27T15:17:41Z) - On the Theory of Implicit Deep Learning: Global Convergence with
Implicit Layers [6.548580592686076]
A deep equilibrium model uses implicit numerical sequence which are implicitly defined through an equilibrium point sequence of computation.
We prove a relation between the dynamics of the deep implicit and the dynamics of the dynamics method of a shallow layer of trust.
arXiv Detail & Related papers (2021-02-15T05:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.