Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient
adaptive algorithms for neural networks
- URL: http://arxiv.org/abs/2105.13937v3
- Date: Sat, 2 Mar 2024 12:55:56 GMT
- Title: Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient
adaptive algorithms for neural networks
- Authors: Dong-Young Lim and Sotirios Sabanis
- Abstract summary: We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive vanishing algorithms.
In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$varepsilon$O POULA (or, simply, TheoPouLa)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new class of Langevin based algorithms, which overcomes many of
the known shortcomings of popular adaptive optimizers that are currently used
for the fine tuning of deep learning models. Its underpinning theory relies on
recent advances of Euler's polygonal approximations for stochastic differential
equations (SDEs) with monotone coefficients. As a result, it inherits the
stability properties of tamed algorithms, while it addresses other known
issues, e.g. vanishing gradients in neural networks. In particular, we provide
a nonasymptotic analysis and full theoretical guarantees for the convergence
properties of an algorithm of this novel class, which we named TH$\varepsilon$O
POULA (or, simply, TheoPouLa). Finally, several experiments are presented with
different types of deep learning models, which show the superior performance of
TheoPouLa over many popular adaptive optimization algorithms.
Related papers
- Unfolded proximal neural networks for robust image Gaussian denoising [7.018591019975253]
We propose a unified framework to build PNNs for the Gaussian denoising task, based on both the dual-FB and the primal-dual Chambolle-Pock algorithms.
We also show that accelerated versions of these algorithms enable skip connections in the associated NN layers.
arXiv Detail & Related papers (2023-08-06T15:32:16Z) - Online Learning Under A Separable Stochastic Approximation Framework [20.26530917721778]
We propose an online learning algorithm for a class of machine learning models under a separable approximation framework.
We show that the proposed algorithm produces more robust and test performance when compared to other popular learning algorithms.
arXiv Detail & Related papers (2023-05-12T13:53:03Z) - AskewSGD : An Annealed interval-constrained Optimisation method to train
Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights.
Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set.
Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural
Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years.
We show that there is a lack of approaches to compute a stationary point deterministically.
We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Taming neural networks with TUSLA: Non-convex learning via adaptive
stochastic gradient Langevin algorithms [0.0]
We offer on an appropriately constructed gradient algorithm based on problematic Lange dynamics (SGLD)
We also provide nonasymptotic analysis of the use of new algorithm's convergence properties.
The roots of the TUSLA algorithm are based on the taming processes with developed coefficients citettamed-euler.
arXiv Detail & Related papers (2020-06-25T16:06:22Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.