ADMM Training Algorithms for Residual Networks: Convergence, Complexity
and Parallel Training
- URL: http://arxiv.org/abs/2310.15334v1
- Date: Mon, 23 Oct 2023 20:01:06 GMT
- Title: ADMM Training Algorithms for Residual Networks: Convergence, Complexity
and Parallel Training
- Authors: Jintao Xu, Yifei Li, Wenxun Xing
- Abstract summary: We design a series of serial and parallel proximal point (gradient) ADMMs for the FCResNets training problem.
Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework.
The advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically.
- Score: 6.0068966996888395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design a series of serial and parallel proximal point (gradient) ADMMs for
the fully connected residual networks (FCResNets) training problem by
introducing auxiliary variables. Convergence of the proximal point version is
proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we
can ensure a locally R-linear or sublinear convergence rate depending on the
different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary
auxiliary function is constructed to realize our goal. Moreover, the advantages
of the parallel implementation in terms of lower time complexity and less
(per-node) memory consumption are analyzed theoretically. To the best of our
knowledge, this is the first work analyzing the convergence, convergence rate,
time complexity and (per-node) runtime memory requirement of the ADMM applied
in the FCResNets training problem theoretically. Experiments are reported to
show the high speed, better performance, robustness and potential in the deep
network training tasks. Finally, we present the advantage and potential of our
parallel training in large-scale problems.
Related papers
- Approximating G(t)/GI/1 queues with deep learning [0.0]
We apply a supervised machine-learning approach to solve a problem in queueing theory.
It estimates the transient distribution of the number in the system for a G(t)/GI/1.
We develop a neural network mechanism that provides a fast and accurate predictor of these distributions.
arXiv Detail & Related papers (2024-07-11T05:25:45Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - Convergence Rates of Training Deep Neural Networks via Alternating
Minimization Methods [6.425552131743896]
We propose a unified framework for analyzing the convergence rate of deep neural networks (DNNs)
In this paper, we show the detailed local convergence exponent if the KL rate $theta$ varies in $[0,1)$.
arXiv Detail & Related papers (2022-08-30T14:58:44Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.