Related papers: Stochastic batch size for adaptive regularization in deep network optimization

Stochastic batch size for adaptive regularization in deep network optimization

URL: http://arxiv.org/abs/2004.06341v1
Date: Tue, 14 Apr 2020 07:54:53 GMT
Title: Stochastic batch size for adaptive regularization in deep network optimization
Authors: Kensuke Nakamura, Stefano Soatto, Byung-Woo Hong
Abstract summary: We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
Score: 63.68104397173262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining batch size for each model parameter at each optimization iteration. The stochastic batch size is determined by the update probability of each parameter following a distribution of gradient norms in consideration of their local and global properties in the neural network architecture where the range of gradient norms may vary within and across layers. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets. The quantitative evaluation indicates that our algorithm outperforms the state-of-the-art optimization algorithms in generalization while providing less sensitivity to the selection of batch size which often plays a critical role in optimization, thus achieving more robustness to the selection of regularity.

Related papers

Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z)
Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods [20.118513136686452]
First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective. We present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework.
arXiv Detail & Related papers (2023-05-27T18:16:56Z)
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches [7.390918770007728]
Motivated by practical instances involving a handful of reallocations in which outcomes are measured in batches, we develop an adaptive-driven experimentation framework. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms.
arXiv Detail & Related papers (2023-03-21T04:17:03Z)
Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. In this work, we present the first trial in the single-dependent generalization of AUPRC optimization. Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z)
Evolutionary Variational Optimization of Generative Models [0.0]
We combine two popular optimization approaches to derive learning algorithms for generative models: variational optimization and evolutionary algorithms. We show that evolutionary algorithms can effectively and efficiently optimize the variational bound. In the category of "zero-shot" learning, we observed the evolutionary variational algorithm to significantly improve the state-of-the-art in many benchmark settings.
arXiv Detail & Related papers (2020-12-22T19:06:33Z)
On the implementation of a global optimization method for mixed-variable problems [0.30458514384586394]
The algorithm is based on the radial basis function of Gutmann and the metric response surface method of Regis and Shoemaker. We propose several modifications aimed at generalizing and improving these two algorithms.
arXiv Detail & Related papers (2020-09-04T13:36:56Z)
Obtaining Adjustable Regularization for Free via Iterate Averaging [43.75491612671571]
Regularization for optimization is a crucial technique to avoid overfitting in machine learning. We establish an averaging scheme that converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart. Our approaches can be used for accelerated and preconditioned optimization methods as well.
arXiv Detail & Related papers (2020-08-15T15:28:05Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.