A framework for overparameterized learning
- URL: http://arxiv.org/abs/2205.13507v1
- Date: Thu, 26 May 2022 17:17:46 GMT
- Title: A framework for overparameterized learning
- Authors: D\'avid Terj\'ek, Diego Gonz\'alez-S\'anchez
- Abstract summary: An explanation for the success of deep neural networks is a central question in theoretical machine learning.
We propose a framework consisting of a prototype learning problem, which is general enough to cover many popular problems.
We then demonstrate that supervised learning, variational autoencoders and training with gradient penalty can be translated to the prototype problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An explanation for the success of deep neural networks is a central question
in theoretical machine learning. According to classical statistical learning,
the overparameterized nature of such models should imply a failure to
generalize. Many argue that good empirical performance is due to the implicit
regularization of first order optimization methods. In particular, the
Polyak-{\L}ojasiewicz condition leads to gradient descent finding a global
optimum that is close to initialization. In this work, we propose a framework
consisting of a prototype learning problem, which is general enough to cover
many popular problems and even the cases of infinitely wide neural networks and
infinite data. We then perform an analysis from the perspective of the
Polyak-{\L}ojasiewicz condition. We obtain theoretical results of independent
interest, concerning gradient descent on a composition $(f \circ F): G \to
\mathbb{R}$ of functions $F: G \to H$ and $f: H \to \mathbb{R}$ with $G, H$
being Hilbert spaces. Building on these results, we determine the properties
that have to be satisfied by the components of the prototype problem for
gradient descent to find a global optimum that is close to initialization. We
then demonstrate that supervised learning, variational autoencoders and
training with gradient penalty can be translated to the prototype problem.
Finally, we lay out a number of directions for future research.
Related papers
- Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Restricted Strong Convexity of Deep Learning Models with Smooth
Activations [31.003601717265006]
We study the problem of optimization of deep learning models with smooth activation functions.
We introduce a new analysis of optimization based on Restricted Strong Convexity (RSC)
Ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models.
arXiv Detail & Related papers (2022-09-29T21:24:26Z) - Mirror Descent Maximizes Generalized Margin and Can Be Implemented
Efficiently [34.438887960077025]
We show that $p$-$textsfGD$ can noticeably affect the structure and the generalization performance of the learned models.
We also show that $p$-$textsfGD$ is fully parallel in the same manner as SGD and can be used to train deep neural networks.
arXiv Detail & Related papers (2022-05-25T14:33:13Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - A Geometric Analysis of Neural Collapse with Unconstrained Features [40.66585948844492]
We provide the first global optimization landscape analysis of $Neural;Collapse$.
This phenomenon arises in the last-layer classifiers and features of neural networks during the terminal phase of training.
arXiv Detail & Related papers (2021-05-06T00:00:50Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Learning the Hypotheses Space from data: Learning Space and U-curve
Property [0.0]
This paper presents an extension of the classical PAC learning model in which learning problems are modelled not only by a Hypothesis Space $mathcalH$, but also by a Learning Space $mathbbL(mathcalH)$.
Our main contribution is a data driven general learning algorithm to perform regularized Model Selection on $mathbbL(mathcalH)$.
arXiv Detail & Related papers (2020-01-26T22:29:33Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - Why Learning of Large-Scale Neural Networks Behaves Like Convex
Optimization [6.852561400929072]
We present some theoretical work to explain why simple gradient descent methods are so successful in solving non-scale optimization problems.
We prove that the objective functions in learning NNs are convex in canonical model space.
arXiv Detail & Related papers (2019-03-06T02:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.