The Generalization Error of Stochastic Mirror Descent on
Over-Parametrized Linear Models
- URL: http://arxiv.org/abs/2302.09433v1
- Date: Sat, 18 Feb 2023 22:23:42 GMT
- Title: The Generalization Error of Stochastic Mirror Descent on
Over-Parametrized Linear Models
- Authors: Danil Akhtiamov, Babak Hassibi
- Abstract summary: Deep networks are known to generalize well to unseen data.
Regularization properties ensure interpolating solutions with "good" properties are found.
We present simulation results that validate the theory and introduce two data models.
- Score: 37.6314945221565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite being highly over-parametrized, and having the ability to fully
interpolate the training data, deep networks are known to generalize well to
unseen data. It is now understood that part of the reason for this is that the
training algorithms used have certain implicit regularization properties that
ensure interpolating solutions with "good" properties are found. This is best
understood in linear over-parametrized models where it has been shown that the
celebrated stochastic gradient descent (SGD) algorithm finds an interpolating
solution that is closest in Euclidean distance to the initial weight vector.
Different regularizers, replacing Euclidean distance with Bregman divergence,
can be obtained if we replace SGD with stochastic mirror descent (SMD).
Empirical observations have shown that in the deep network setting, SMD
achieves a generalization performance that is different from that of SGD (and
which depends on the choice of SMD's potential function. In an attempt to begin
to understand this behavior, we obtain the generalization error of SMD for
over-parametrized linear models for a binary classification problem where the
two classes are drawn from a Gaussian mixture model. We present simulation
results that validate the theory and, in particular, introduce two data models,
one for which SMD with an $\ell_2$ regularizer (i.e., SGD) outperforms SMD with
an $\ell_1$ regularizer, and one for which the reverse happens.
Related papers
- Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Stochastic Mirror Descent in Average Ensemble Models [38.38572705720122]
The mirror descent (SMD) is a general class of training algorithms, which includes the celebrated gradient descent (SGD) as a special case.
In this paper we explore the performance of the mirror potential algorithm on mean-field ensemble models.
arXiv Detail & Related papers (2022-10-27T11:04:00Z) - uGLAD: Sparse graph recovery by optimizing deep unrolled networks [11.48281545083889]
We present a novel technique to perform sparse graph recovery by optimizing deep unrolled networks.
Our model, uGLAD, builds upon and extends the state-of-the-art model GLAD to the unsupervised setting.
We evaluate model results on synthetic Gaussian data, non-Gaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.
arXiv Detail & Related papers (2022-05-23T20:20:27Z) - Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression.
We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z) - Explicit Regularization via Regularizer Mirror Descent [32.0512015286512]
We propose a new method for training deep neural networks (DNNs) with regularization, called regularizer mirror descent (RMD)
RMD simultaneously interpolates the training data and minimizes a certain potential function of the weights.
Our results suggest that the performance ofRMD is remarkably robust and significantly better than both gradient descent (SGD) and weight decay.
arXiv Detail & Related papers (2022-02-22T10:21:44Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.