Related papers: ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training

ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training

URL: http://arxiv.org/abs/2310.15334v1
Date: Mon, 23 Oct 2023 20:01:06 GMT
Title: ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training
Authors: Jintao Xu, Yifei Li, Wenxun Xing
Abstract summary: We design a series of serial and parallel proximal point (gradient) ADMMs for the FCResNets training problem. Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework. The advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically.
Score: 6.0068966996888395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design a series of serial and parallel proximal point (gradient) ADMMs for the fully connected residual networks (FCResNets) training problem by introducing auxiliary variables. Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we can ensure a locally R-linear or sublinear convergence rate depending on the different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary auxiliary function is constructed to realize our goal. Moreover, the advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically. To the best of our knowledge, this is the first work analyzing the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem theoretically. Experiments are reported to show the high speed, better performance, robustness and potential in the deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems.

Related papers

Approximating G(t)/GI/1 queues with deep learning [0.0]
We apply a supervised machine-learning approach to solve a problem in queueing theory. It estimates the transient distribution of the number in the system for a G(t)/GI/1. We develop a neural network mechanism that provides a fast and accurate predictor of these distributions.
arXiv Detail & Related papers (2024-07-11T05:25:45Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Efficient Parametric Approximations of Neural Network Function Space Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z)
Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods [6.425552131743896]
We propose a unified framework for analyzing the convergence rate of deep neural networks (DNNs) In this paper, we show the detailed local convergence exponent if the KL rate $theta$ varies in $[0,1)$.
arXiv Detail & Related papers (2022-08-30T14:58:44Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.