Related papers: A framework for overparameterized learning

A framework for overparameterized learning

URL: http://arxiv.org/abs/2205.13507v1
Date: Thu, 26 May 2022 17:17:46 GMT
Title: A framework for overparameterized learning
Authors: D\'avid Terj\'ek, Diego Gonz\'alez-S\'anchez
Abstract summary: An explanation for the success of deep neural networks is a central question in theoretical machine learning. We propose a framework consisting of a prototype learning problem, which is general enough to cover many popular problems. We then demonstrate that supervised learning, variational autoencoders and training with gradient penalty can be translated to the prototype problem.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An explanation for the success of deep neural networks is a central question in theoretical machine learning. According to classical statistical learning, the overparameterized nature of such models should imply a failure to generalize. Many argue that good empirical performance is due to the implicit regularization of first order optimization methods. In particular, the Polyak-{\L}ojasiewicz condition leads to gradient descent finding a global optimum that is close to initialization. In this work, we propose a framework consisting of a prototype learning problem, which is general enough to cover many popular problems and even the cases of infinitely wide neural networks and infinite data. We then perform an analysis from the perspective of the Polyak-{\L}ojasiewicz condition. We obtain theoretical results of independent interest, concerning gradient descent on a composition $(f \circ F): G \to \mathbb{R}$ of functions $F: G \to H$ and $f: H \to \mathbb{R}$ with $G, H$ being Hilbert spaces. Building on these results, we determine the properties that have to be satisfied by the components of the prototype problem for gradient descent to find a global optimum that is close to initialization. We then demonstrate that supervised learning, variational autoencoders and training with gradient penalty can be translated to the prototype problem. Finally, we lay out a number of directions for future research.

Related papers

Joint Learning in the Gaussian Single Index Model [6.3151583550712065]
We consider the problem of jointly learning a one-dimensional projection and a unidimensional function in high-dimensional Gaussian models.<n>Our analysis shows that convergence still occurs even when the initial direction is negatively correlated with the target.<n>On the practical side, we demonstrate that such joint learning can be effectively implemented using a Reproducing Hilbert Kernel Space adapted to the structure of the problem.
arXiv Detail & Related papers (2025-05-27T15:30:34Z)
Learning High-Degree Parities: The Crucial Role of the Initialization [15.527103574584663]
This paper shows that for gradient descent on regular neural networks, learnability depends on the initial weight distribution. The positive result for almost-full parities is shown to hold up to $sigma=O(d-1)$, pointing to questions about a sharper threshold phenomenon.
arXiv Detail & Related papers (2024-12-06T10:05:10Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Restricted Strong Convexity of Deep Learning Models with Smooth Activations [31.003601717265006]
We study the problem of optimization of deep learning models with smooth activation functions. We introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) Ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models.
arXiv Detail & Related papers (2022-09-29T21:24:26Z)
Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently [34.438887960077025]
We show that $p$-$textsfGD$ can noticeably affect the structure and the generalization performance of the learned models. We also show that $p$-$textsfGD$ is fully parallel in the same manner as SGD and can be used to train deep neural networks.
arXiv Detail & Related papers (2022-05-25T14:33:13Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization. We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL) New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z)
A Geometric Analysis of Neural Collapse with Unconstrained Features [40.66585948844492]
We provide the first global optimization landscape analysis of $Neural;Collapse$. This phenomenon arises in the last-layer classifiers and features of neural networks during the terminal phase of training.
arXiv Detail & Related papers (2021-05-06T00:00:50Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Learning the Hypotheses Space from data: Learning Space and U-curve Property [0.0]
This paper presents an extension of the classical PAC learning model in which learning problems are modelled not only by a Hypothesis Space $mathcalH$, but also by a Learning Space $mathbbL(mathcalH)$. Our main contribution is a data driven general learning algorithm to perform regularized Model Selection on $mathbbL(mathcalH)$.
arXiv Detail & Related papers (2020-01-26T22:29:33Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization [6.852561400929072]
We present some theoretical work to explain why simple gradient descent methods are so successful in solving non-scale optimization problems. We prove that the objective functions in learning NNs are convex in canonical model space.
arXiv Detail & Related papers (2019-03-06T02:21:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.