A Convergent ADMM Framework for Efficient Neural Network Training
- URL: http://arxiv.org/abs/2112.11619v1
- Date: Wed, 22 Dec 2021 01:55:24 GMT
- Title: A Convergent ADMM Framework for Efficient Neural Network Training
- Authors: Junxiang Wang, Hongyi Li, Liang Zhao
- Abstract summary: Alternating Direction Method of Multipliers (ADMM) has achieved tremendous success in many classification and regression applications.
We propose a novel framework to solve a general neural network training problem via ADMM (dlADMM) to address these challenges simultaneously.
Experiments on seven benchmark datasets demonstrate the convergence, efficiency, and effectiveness of our proposed dlADMM algorithm.
- Score: 17.764095204676973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a well-known optimization framework, the Alternating Direction Method of
Multipliers (ADMM) has achieved tremendous success in many classification and
regression applications. Recently, it has attracted the attention of deep
learning researchers and is considered to be a potential substitute to Gradient
Descent (GD). However, as an emerging domain, several challenges remain
unsolved, including 1) The lack of global convergence guarantees, 2) Slow
convergence towards solutions, and 3) Cubic time complexity with regard to
feature dimensions. In this paper, we propose a novel optimization framework to
solve a general neural network training problem via ADMM (dlADMM) to address
these challenges simultaneously. Specifically, the parameters in each layer are
updated backward and then forward so that parameter information in each layer
is exchanged efficiently. When the dlADMM is applied to specific architectures,
the time complexity of subproblems is reduced from cubic to quadratic via a
dedicated algorithm design utilizing quadratic approximations and backtracking
techniques. Last but not least, we provide the first proof of convergence to a
critical point sublinearly for an ADMM-type method (dlADMM) under mild
conditions. Experiments on seven benchmark datasets demonstrate the
convergence, efficiency, and effectiveness of our proposed dlADMM algorithm.
Related papers
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - A unified consensus-based parallel ADMM algorithm for high-dimensional
regression with combined regularizations [3.280169909938912]
parallel alternating multipliers (ADMM) is widely recognized for its effectiveness in handling large-scale distributed datasets.
The proposed algorithms serve to demonstrate the reliability, stability, and scalability of a financial example.
arXiv Detail & Related papers (2023-11-21T03:30:38Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Efficient differentiable quadratic programming layers: an ADMM approach [0.0]
We present an alternative network layer architecture based on the alternating direction method of multipliers (ADMM)
Backward differentiation is performed by implicit differentiation of the residual map of a modified fixed-point iteration.
Simulated results demonstrate the computational advantage of the ADMM layer, which for medium scaled problems is approximately an order of magnitude faster than the OptNet quadratic programming layer.
arXiv Detail & Related papers (2021-12-14T15:25:07Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Faster Stochastic Alternating Direction Method of Multipliers for
Nonconvex Optimization [110.52708815647613]
In this paper, we propose a faster alternating direction of multipliers (ADMM) for non-integrated optimization by using a new path, called SPADMM.
We prove that the SPADMM achieves a-breaking first-order differential oracle estimator (IFO) for finding a record of an IFO.
Our theoretical analysis shows that the online SPIDER-ADMM has the IFOFO(epsilon) by a factor of $mathcalO(n1)$.
arXiv Detail & Related papers (2020-08-04T02:59:42Z) - Deep unfolding of the weighted MMSE beamforming algorithm [9.518010235273783]
We propose the novel application of deep unfolding to the WMMSE algorithm for a MISO downlink channel.
Deep unfolding naturally incorporates expert knowledge, with the benefits of immediate and well-grounded architecture selection, fewer trainable parameters, and better explainability.
By means of simulations, we show that, in most of the settings, the unfolded WMMSE outperforms or performs equally to the WMMSE for a fixed number of iterations.
arXiv Detail & Related papers (2020-06-15T14:51:20Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.