Related papers: BADM: Batch ADMM for Deep Learning

BADM: Batch ADMM for Deep Learning

URL: http://arxiv.org/abs/2407.01640v1
Date: Sun, 30 Jun 2024 20:47:15 GMT
Title: BADM: Batch ADMM for Deep Learning
Authors: Ouya Wang, Shenglong Zhou, Geoffrey Ye Li,
Abstract summary: gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. We leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM) We evaluate the performance of BADM across various deep learning tasks, including graph modelling, computer vision, image generation, and natural language processing.
Score: 35.39258144247444
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM). The fundamental idea of the proposed algorithm is to split the training data into batches, which is further divided into sub-batches where primal and dual variables are updated to generate global parameters through aggregation. We evaluate the performance of BADM across various deep learning tasks, including graph modelling, computer vision, image generation, and natural language processing. Extensive numerical experiments demonstrate that BADM achieves faster convergence and superior testing accuracy compared to other state-of-the-art optimizers.

Related papers

AA-DLADMM: An Accelerated ADMM-based Framework for Training Deep Neural Networks [1.3812010983144802]
gradient descent (SGD) and its many variants are the widespread optimization algorithms for training deep neural networks. SGD suffers from inevitable drawbacks, including vanishing gradients, lack of theoretical guarantees, and substantial sensitivity to input. This paper proposes an Anderson Acceleration for Deep Learning ADMM (AA-DLADMM) algorithm to tackle this drawback.
arXiv Detail & Related papers (2024-01-08T01:22:00Z)
ADMM Algorithms for Residual Network Training: Convergence Analysis and Parallel Implementation [5.3446906736406135]
We propose both serial and parallel proximal (linearized) alternating direction method of multipliers (ADMM) algorithms for training residual neural networks. We prove that the proposed algorithms converge at an R-linear (sublinear) rate for both the iteration points and the objective function values. Experimental results validate the proposed ADMM algorithms, demonstrating rapid and stable convergence, improved performance, and high computational efficiency.
arXiv Detail & Related papers (2023-10-23T20:01:06Z)
BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Enabling Deep Learning-based Physical-layer Secret Key Generation for FDD-OFDM Systems in Multi-Environments [27.47842642468537]
This paper formulates the PKG problem in multiple environments as a learning-based problem. We propose deep transfer learning (DTL) and meta-learning-based channel feature mapping algorithms for key generation.
arXiv Detail & Related papers (2022-11-06T09:24:04Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Federated Learning via Inexact ADMM [46.99210047518554]
In this paper, we develop an inexact alternating direction method of multipliers (ADMM) It is both- and communication-efficient, capable of combating the stragglers' effect, and convergent under mild conditions. It has a high numerical performance compared with several state-of-the-art algorithms for federated learning.
arXiv Detail & Related papers (2022-04-22T09:55:33Z)
Bilevel Online Deep Learning in Non-stationary Environment [4.565872584112864]
Bilevel Online Deep Learning (BODL) framework combines bilevel optimization strategy and online ensemble classifier. When the concept drift is detected, our BODL algorithm can adaptively update the model parameters via bilevel optimization and then circumvent the large drift and encourage positive transfer.
arXiv Detail & Related papers (2022-01-25T11:05:51Z)
An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training [0.951828574518325]
A limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems. We propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information.
arXiv Detail & Related papers (2020-12-14T11:40:41Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.