Related papers: Global Convergence Rate of Deep Equilibrium Models with General Activations

Global Convergence Rate of Deep Equilibrium Models with General Activations

URL: http://arxiv.org/abs/2302.05797v3
Date: Fri, 1 Mar 2024 18:23:01 GMT
Title: Global Convergence Rate of Deep Equilibrium Models with General Activations
Authors: Lan V. Truong
Abstract summary: This paper shows that the fact still holds for DEQs with any generally bounded activation with bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging.
Score: 18.601449856300984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution for the quadratic loss function at a linear convergence rate. This paper shows that this fact still holds for DEQs with any generally bounded activation with bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we must create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.

Related papers

An Accelerated Alternating Partial Bregman Algorithm for ReLU-based Matrix Decomposition [0.0]
In this paper, we aim to investigate the sparse low-rank characteristics rectified on non-negative matrices. We propose a novel regularization term incorporating useful structures in clustering and compression tasks. We derive corresponding closed-form solutions while maintaining the $L$-smooth property always holds for any $Lge 1$.
arXiv Detail & Related papers (2025-03-04T08:20:34Z)
Convergence of Kinetic Langevin Monte Carlo on Lie groups [21.76159063788814]
We propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics. This is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry.
arXiv Detail & Related papers (2024-03-18T17:50:20Z)
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks. We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z)
Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs [52.52570805621925]
We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrix for node classification. We show that the resulting model lead to new graphs and residual scaling parameter. We demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous parameters.
arXiv Detail & Related papers (2022-09-12T04:46:55Z)
Global Convergence of Over-parameterized Deep Equilibrium Models [52.65330015267245]
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
arXiv Detail & Related papers (2022-05-27T08:00:13Z)
Reduced density matrix functional theory from an ab initio seniority-zero wave function: Exact and approximate formulations along adiabatic connection paths [0.0]
We propose an alternative formulation of reduced density-matrix functional theory (RDMFT) The exact natural orbitals and their occupancies are determined self-consistently from an effective seniority-zero calculation. This information is expected to serve as a guide in the future design of higher-seniority density-matrix functional approximations.
arXiv Detail & Related papers (2022-04-01T21:27:25Z)
Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems. We control the Lipschitz constant explicitly using a user-defined upper-bound. We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z)
Optimization Induced Equilibrium Networks [76.05825996887573]
Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently. We show that deep OptEq outperforms previous implicit models even with fewer parameters.
arXiv Detail & Related papers (2021-05-27T15:17:41Z)
A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training. If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z)
Global Convergence of Second-order Dynamics in Two-layer Neural Networks [10.415177082023389]
Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit. We show that the answer is positive for the heavy ball method. While our results are functional in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks.
arXiv Detail & Related papers (2020-07-14T07:01:57Z)
Competitive Mirror Descent [67.31015611281225]
Constrained competitive optimization involves multiple agents trying to minimize conflicting objectives, subject to constraints. We propose competitive mirror descent (CMD): a general method for solving such problems based on first order information. As a special case we obtain a novel competitive multiplicative weights algorithm for problems on the positive cone.
arXiv Detail & Related papers (2020-06-17T22:11:35Z)
Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression. The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.