Related papers: The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models

The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models

URL: http://arxiv.org/abs/2302.09433v1
Date: Sat, 18 Feb 2023 22:23:42 GMT
Title: The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models
Authors: Danil Akhtiamov, Babak Hassibi
Abstract summary: Deep networks are known to generalize well to unseen data. Regularization properties ensure interpolating solutions with "good" properties are found. We present simulation results that validate the theory and introduce two data models.
Score: 37.6314945221565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite being highly over-parametrized, and having the ability to fully interpolate the training data, deep networks are known to generalize well to unseen data. It is now understood that part of the reason for this is that the training algorithms used have certain implicit regularization properties that ensure interpolating solutions with "good" properties are found. This is best understood in linear over-parametrized models where it has been shown that the celebrated stochastic gradient descent (SGD) algorithm finds an interpolating solution that is closest in Euclidean distance to the initial weight vector. Different regularizers, replacing Euclidean distance with Bregman divergence, can be obtained if we replace SGD with stochastic mirror descent (SMD). Empirical observations have shown that in the deep network setting, SMD achieves a generalization performance that is different from that of SGD (and which depends on the choice of SMD's potential function. In an attempt to begin to understand this behavior, we obtain the generalization error of SMD for over-parametrized linear models for a binary classification problem where the two classes are drawn from a Gaussian mixture model. We present simulation results that validate the theory and, in particular, introduce two data models, one for which SMD with an $\ell_2$ regularizer (i.e., SGD) outperforms SMD with an $\ell_1$ regularizer, and one for which the reverse happens.

Related papers

Learning Mixtures of Experts with EM: A Mirror Descent Perspective [28.48469221248906]
Classical Mixtures of Experts (MoE) are Machine Learning models that involve the input space, with a separate "expert" model trained on each partition.<n>We study theoretical guarantees of the Expectation Maximization (EM) algorithm for the training of MoE models.
arXiv Detail & Related papers (2024-11-09T03:44:09Z)
Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables. We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption. We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z)
Stochastic Mirror Descent in Average Ensemble Models [38.38572705720122]
The mirror descent (SMD) is a general class of training algorithms, which includes the celebrated gradient descent (SGD) as a special case. In this paper we explore the performance of the mirror potential algorithm on mean-field ensemble models.
arXiv Detail & Related papers (2022-10-27T11:04:00Z)
uGLAD: Sparse graph recovery by optimizing deep unrolled networks [11.48281545083889]
We present a novel technique to perform sparse graph recovery by optimizing deep unrolled networks. Our model, uGLAD, builds upon and extends the state-of-the-art model GLAD to the unsupervised setting. We evaluate model results on synthetic Gaussian data, non-Gaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.
arXiv Detail & Related papers (2022-05-23T20:20:27Z)
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression. We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z)
Explicit Regularization via Regularizer Mirror Descent [32.0512015286512]
We propose a new method for training deep neural networks (DNNs) with regularization, called regularizer mirror descent (RMD) RMD simultaneously interpolates the training data and minimizes a certain potential function of the weights. Our results suggest that the performance ofRMD is remarkably robust and significantly better than both gradient descent (SGD) and weight decay.
arXiv Detail & Related papers (2022-02-22T10:21:44Z)
Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI. We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.