AA-DLADMM: An Accelerated ADMM-based Framework for Training Deep Neural
Networks
- URL: http://arxiv.org/abs/2401.03619v1
- Date: Mon, 8 Jan 2024 01:22:00 GMT
- Title: AA-DLADMM: An Accelerated ADMM-based Framework for Training Deep Neural
Networks
- Authors: Zeinab Ebrahimi, Gustavo Batista and Mohammad Deghat
- Abstract summary: gradient descent (SGD) and its many variants are the widespread optimization algorithms for training deep neural networks.
SGD suffers from inevitable drawbacks, including vanishing gradients, lack of theoretical guarantees, and substantial sensitivity to input.
This paper proposes an Anderson Acceleration for Deep Learning ADMM (AA-DLADMM) algorithm to tackle this drawback.
- Score: 1.3812010983144802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent (SGD) and its many variants are the widespread
optimization algorithms for training deep neural networks. However, SGD suffers
from inevitable drawbacks, including vanishing gradients, lack of theoretical
guarantees, and substantial sensitivity to input. The Alternating Direction
Method of Multipliers (ADMM) has been proposed to address these shortcomings as
an effective alternative to the gradient-based methods. It has been
successfully employed for training deep neural networks. However, ADMM-based
optimizers have a slow convergence rate. This paper proposes an Anderson
Acceleration for Deep Learning ADMM (AA-DLADMM) algorithm to tackle this
drawback. The main intention of the AA-DLADMM algorithm is to employ Anderson
acceleration to ADMM by considering it as a fixed-point iteration and attaining
a nearly quadratic convergence rate. We verify the effectiveness and efficiency
of the proposed AA-DLADMM algorithm by conducting extensive experiments on four
benchmark datasets contrary to other state-of-the-art optimizers.
Related papers
- MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training.
We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - BADM: Batch ADMM for Deep Learning [35.39258144247444]
gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence.
We leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM)
We evaluate the performance of BADM across various deep learning tasks, including graph modelling, computer vision, image generation, and natural language processing.
arXiv Detail & Related papers (2024-06-30T20:47:15Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Federated Learning via Inexact ADMM [46.99210047518554]
In this paper, we develop an inexact alternating direction method of multipliers (ADMM)
It is both- and communication-efficient, capable of combating the stragglers' effect, and convergent under mild conditions.
It has a high numerical performance compared with several state-of-the-art algorithms for federated learning.
arXiv Detail & Related papers (2022-04-22T09:55:33Z) - A Distributed Algorithm for Measure-valued Optimization with Additive
Objective [1.0965065178451106]
We propose a distributed nonvalued algorithm for solving measure-parametric optimization problems with additive objectives.
The proposed algorithm comprises a two-layer alternating direction multipliers (ADMM)
The overall algorithm realizes operator splitting gradient for flows in the manifold of probability measures.
arXiv Detail & Related papers (2022-02-17T23:09:41Z) - A Convergent ADMM Framework for Efficient Neural Network Training [17.764095204676973]
Alternating Direction Method of Multipliers (ADMM) has achieved tremendous success in many classification and regression applications.
We propose a novel framework to solve a general neural network training problem via ADMM (dlADMM) to address these challenges simultaneously.
Experiments on seven benchmark datasets demonstrate the convergence, efficiency, and effectiveness of our proposed dlADMM algorithm.
arXiv Detail & Related papers (2021-12-22T01:55:24Z) - Adam revisited: a weighted past gradients perspective [57.54752290924522]
We propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues.
We prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD.
arXiv Detail & Related papers (2021-01-01T14:01:52Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.