Convergence Rates of Training Deep Neural Networks via Alternating
Minimization Methods
- URL: http://arxiv.org/abs/2208.14318v2
- Date: Tue, 4 Apr 2023 07:56:03 GMT
- Title: Convergence Rates of Training Deep Neural Networks via Alternating
Minimization Methods
- Authors: Jintao Xu, Chenglong Bao, Wenxun Xing
- Abstract summary: We propose a unified framework for analyzing the convergence rate of deep neural networks (DNNs)
In this paper, we show the detailed local convergence exponent if the KL rate $theta$ varies in $[0,1)$.
- Score: 6.425552131743896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural networks (DNNs) is an important and challenging
optimization problem in machine learning due to its non-convexity and
non-separable structure. The alternating minimization (AM) approaches split the
composition structure of DNNs and have drawn great interest in the deep
learning and optimization communities. In this paper, we propose a unified
framework for analyzing the convergence rate of AM-type network training
methods. Our analysis is based on the non-monotone $j$-step sufficient decrease
conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the
requirement of designing descent algorithms. We show the detailed local
convergence rate if the KL exponent $\theta$ varies in $[0,1)$. Moreover, the
local R-linear convergence is discussed under a stronger $j$-step sufficient
decrease condition.
Related papers
- Neural Network with Local Converging Input (NNLCI) for Supersonic Flow
Problems with Unstructured Grids [0.9152133607343995]
We develop a neural network with local converging input (NNLCI) for high-fidelity prediction using unstructured data.
As a validation case, the NNLCI method is applied to study inviscid supersonic flows in channels with bumps.
arXiv Detail & Related papers (2023-10-23T19:03:37Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Finite-Sum Optimization: A New Perspective for Convergence to a Global
Solution [22.016345507132808]
Deep neural networks (DNNs) have shown great success in many machine learning tasks.
Their training is challenging since the loss surface is generally non-smooth or even bounded.
We propose an algorithmic framework allowing a minimization problem allowing convergence to an $varepsilon$-(global) minimum.
arXiv Detail & Related papers (2022-02-07T21:23:16Z) - Neural Optimization Kernel: Towards Robust Deep Learning [13.147925376013129]
Recent studies show a connection between neural networks (NN) and kernel methods.
This paper proposes a novel kernel family named Kernel (NOK)
We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
arXiv Detail & Related papers (2021-06-11T00:34:55Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs)
By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue.
Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.