ADMM Algorithms for Residual Network Training: Convergence Analysis and Parallel Implementation
- URL: http://arxiv.org/abs/2310.15334v2
- Date: Mon, 31 Mar 2025 03:37:38 GMT
- Title: ADMM Algorithms for Residual Network Training: Convergence Analysis and Parallel Implementation
- Authors: Jintao Xu, Yifei Li, Wenxun Xing,
- Abstract summary: We propose both serial and parallel proximal (linearized) alternating direction method of multipliers (ADMM) algorithms for training residual neural networks.<n>We prove that the proposed algorithms converge at an R-linear (sublinear) rate for both the iteration points and the objective function values.<n> Experimental results validate the proposed ADMM algorithms, demonstrating rapid and stable convergence, improved performance, and high computational efficiency.
- Score: 5.3446906736406135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose both serial and parallel proximal (linearized) alternating direction method of multipliers (ADMM) algorithms for training residual neural networks. In contrast to backpropagation-based approaches, our methods inherently mitigate the exploding gradient issue and are well-suited for parallel and distributed training through regional updates. Theoretically, we prove that the proposed algorithms converge at an R-linear (sublinear) rate for both the iteration points and the objective function values. These results hold without imposing stringent constraints on network width, depth, or training data size. Furthermore, we theoretically analyze our parallel/distributed ADMM algorithms, highlighting their reduced time complexity and lower per-node memory consumption. To facilitate practical deployment, we develop a control protocol for parallel ADMM implementation using Python's multiprocessing and interprocess communication. Experimental results validate the proposed ADMM algorithms, demonstrating rapid and stable convergence, improved performance, and high computational efficiency. Finally, we highlight the improved scalability and efficiency achieved by our parallel ADMM training strategy.
Related papers
- Approximating G(t)/GI/1 queues with deep learning [0.0]
We apply a supervised machine-learning approach to solve a problem in queueing theory.
It estimates the transient distribution of the number in the system for a G(t)/GI/1.
We develop a neural network mechanism that provides a fast and accurate predictor of these distributions.
arXiv Detail & Related papers (2024-07-11T05:25:45Z) - BADM: Batch ADMM for Deep Learning [35.39258144247444]
gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence.
We leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM)
We evaluate the performance of BADM across various deep learning tasks, including graph modelling, computer vision, image generation, and natural language processing.
arXiv Detail & Related papers (2024-06-30T20:47:15Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - Convergence Rates of Training Deep Neural Networks via Alternating
Minimization Methods [6.425552131743896]
We propose a unified framework for analyzing the convergence rate of deep neural networks (DNNs)
In this paper, we show the detailed local convergence exponent if the KL rate $theta$ varies in $[0,1)$.
arXiv Detail & Related papers (2022-08-30T14:58:44Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Partitioning sparse deep neural networks for scalable training and
inference [8.282177703075453]
State-of-the-art deep neural networks (DNNs) have significant computational and data management requirements.
Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs.
The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning.
arXiv Detail & Related papers (2021-04-23T20:05:52Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Restructuring, Pruning, and Adjustment of Deep Models for Parallel
Distributed Inference [15.720414948573753]
We consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers)
We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model.
We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation.
arXiv Detail & Related papers (2020-08-19T06:44:41Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.