The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous
Neural Networks
- URL: http://arxiv.org/abs/2012.06244v1
- Date: Fri, 11 Dec 2020 11:15:32 GMT
- Title: The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous
Neural Networks
- Authors: Bohan Wang, Qi Meng, Wei Chen
- Abstract summary: We study the implicit bias of adaptive optimization algorithms on homogeneous neural networks.
It is the first work to study the convergent direction of adaptive optimizations on non-linear deep neural networks.
- Score: 21.63353575405414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their overwhelming capacity to overfit, deep neural networks trained
by specific optimization algorithms tend to generalize relatively well to
unseen data. Recently, researchers explained it by investigating the implicit
bias of optimization algorithms. A remarkable progress is the work [18], which
proves gradient descent (GD) maximizes the margin of homogeneous deep neural
networks. Except the first-order optimization algorithms like GD, adaptive
algorithms such as AdaGrad, RMSProp and Adam are popular owing to its rapid
training process. Meanwhile, numerous works have provided empirical evidence
that adaptive methods may suffer from poor generalization performance. However,
theoretical explanation for the generalization of adaptive optimization
algorithms is still lacking. In this paper, we study the implicit bias of
adaptive optimization algorithms on homogeneous neural networks. In particular,
we study the convergent direction of parameters when they are optimizing the
logistic loss. We prove that the convergent direction of RMSProp is the same
with GD, while for AdaGrad, the convergent direction depends on the adaptive
conditioner. Technically, we provide a unified framework to analyze convergent
direction of adaptive optimization algorithms by constructing novel and
nontrivial adaptive gradient flow and surrogate margin. The theoretical
findings explain the superiority on generalization of exponential moving
average strategy that is adopted by RMSProp and Adam. To the best of knowledge,
it is the first work to study the convergent direction of adaptive
optimizations on non-linear deep neural networks
Related papers
- Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation.
We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z) - Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods [20.118513136686452]
First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective.
We present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms.
Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework.
arXiv Detail & Related papers (2023-05-27T18:16:56Z) - Genetically Modified Wolf Optimization with Stochastic Gradient Descent
for Optimising Deep Neural Networks [0.0]
This research aims to analyze an alternative approach to optimizing neural network (NN) weights, with the use of population-based metaheuristic algorithms.
A hybrid between Grey Wolf (GWO) and Genetic Modified Algorithms (GA) is explored, in conjunction with Gradient Descent (SGD)
This algorithm allows for a combination between exploitation and exploration, whilst also tackling the issue of high-dimensionality.
arXiv Detail & Related papers (2023-01-21T13:22:09Z) - How Does Adaptive Optimization Impact Local Neural Network Geometry? [32.32593743852949]
We argue that in the context of neural network optimization, this traditional viewpoint is insufficient.
We show that adaptive methods such as Adam bias the trajectories towards regions where one might expect faster convergence.
arXiv Detail & Related papers (2022-11-04T04:05:57Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Iterative Surrogate Model Optimization (ISMO): An active learning
algorithm for PDE constrained optimization with deep neural networks [14.380314061763508]
We present a novel active learning algorithm, termed as iterative surrogate model optimization (ISMO)
This algorithm is based on deep neural networks and its key feature is the iterative selection of training data through a feedback loop between deep neural networks and any underlying standard optimization algorithm.
arXiv Detail & Related papers (2020-08-13T07:31:07Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Stochastic batch size for adaptive regularization in deep network
optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework.
We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.