Related papers: An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

URL: http://arxiv.org/abs/2012.07434v1
Date: Mon, 14 Dec 2020 11:40:41 GMT
Title: An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training
Authors: Federico Zocco and Se\'an McLoone
Abstract summary: A limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems. We propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information.
Score: 0.951828574518325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by the potential for parallel implementation of batch-based algorithms and the accelerated convergence achievable with approximated second order information a limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems. As the shape of the cost function is generally not quadratic and only becomes approximately quadratic in the vicinity of a minimum, the use of second order information by L-BFGS can be unreliable during the initial phase of training, i.e. when far from a minimum. Therefore, to control the influence of second order information as training progresses, we propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information by implementing a progressive storage and use of curvature data through a development-based increase (dev-increase) scheme. Using six discriminative modelling benchmark problems we show empirically that MB-AM has slightly faster convergence and, on average, achieves better solutions than the standard multi-batch L-BFGS algorithm when training MLP and CNN models.

Related papers

BADM: Batch ADMM for Deep Learning [35.39258144247444]
gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. We leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM) We evaluate the performance of BADM across various deep learning tasks, including graph modelling, computer vision, image generation, and natural language processing.
arXiv Detail & Related papers (2024-06-30T20:47:15Z)
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS) For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z)
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization [35.08820062020787]
We propose a momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing. Results show that mL-BFGS achieves both noticeable gradient-wise and wall-clock speedup.
arXiv Detail & Related papers (2023-07-25T18:03:29Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
Learning from Images: Proactive Caching with Parallel Convolutional Neural Networks [94.85780721466816]
A novel framework for proactive caching is proposed in this paper. It combines model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image. Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost.
arXiv Detail & Related papers (2021-08-15T21:32:47Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data. We show that it achieves no-regret under conditions analogous to GP-UCB. Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
Covariance-Free Sparse Bayesian Learning [62.24008859844098]
We introduce a new SBL inference algorithm that avoids explicit inversions of the covariance matrix. Our method can be up to thousands of times faster than existing baselines. We showcase how our new algorithm enables SBL to tractably tackle high-dimensional signal recovery problems.
arXiv Detail & Related papers (2021-05-21T16:20:07Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Block Layer Decomposition schemes for training Deep Neural Networks [0.0]
Deep Feedforward Networks (DFNNs) weights estimation relies on a very large non-Coordinate optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. As a consequence, optimization algorithms can be attracted toward local minimizers which can lead to bad solutions or can slow down the optimization process.
arXiv Detail & Related papers (2020-03-18T09:53:40Z)
Federated Matrix Factorization: Algorithm Design and Application to Data Clustering [18.917444528804463]
Recent demands on data privacy have called for federated learning (FL) as a new distributed learning paradigm in massive and heterogeneous networks. We propose two new FedMF algorithms, namely FedMAvg and FedMGS, based on the model averaging and gradient sharing principles.
arXiv Detail & Related papers (2020-02-12T11:48:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.