Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent
- URL: http://arxiv.org/abs/2205.00058v1
- Date: Fri, 29 Apr 2022 19:37:24 GMT
- Title: Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent
- Authors: Yiling Luo, Xiaoming Huo, Yajun Mei
- Abstract summary: We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression.
We derive a model estimation accuracy result in the setting when the true model is sparse.
- Score: 7.00422423634143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In machine learning and statistical data analysis, we often run into
objective function that is a summation: the number of terms in the summation
possibly is equal to the sample size, which can be enormous. In such a setting,
the stochastic mirror descent (SMD) algorithm is a numerically efficient method
-- each iteration involving a very small subset of the data. The variance
reduction version of SMD (VRSMD) can further improve SMD by inducing faster
convergence. On the other hand, algorithms such as gradient descent and
stochastic gradient descent have the implicit regularization property that
leads to better performance in terms of the generalization errors. Little is
known on whether such a property holds for VRSMD. We prove here that the
discrete VRSMD estimator sequence converges to the minimum mirror interpolant
in the linear regression. This establishes the implicit regularization property
for VRSMD. As an application of the above result, we derive a model estimation
accuracy result in the setting when the true model is sparse. We use numerical
examples to illustrate the empirical power of VRSMD.
Related papers
- Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - The Generalization Error of Stochastic Mirror Descent on
Over-Parametrized Linear Models [37.6314945221565]
Deep networks are known to generalize well to unseen data.
Regularization properties ensure interpolating solutions with "good" properties are found.
We present simulation results that validate the theory and introduce two data models.
arXiv Detail & Related papers (2023-02-18T22:23:42Z) - Stochastic Mirror Descent in Average Ensemble Models [38.38572705720122]
The mirror descent (SMD) is a general class of training algorithms, which includes the celebrated gradient descent (SGD) as a special case.
In this paper we explore the performance of the mirror potential algorithm on mean-field ensemble models.
arXiv Detail & Related papers (2022-10-27T11:04:00Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Variance Reduction on Adaptive Stochastic Mirror Descent [23.451652399157002]
We prove that variance reduction reduces SFO complexity of most adaptive mirror descent algorithms and accelerates their convergence.
We check the validity of our claims using experiments in deep learning.
arXiv Detail & Related papers (2020-12-26T15:15:51Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Direct loss minimization algorithms for sparse Gaussian processes [9.041035455989181]
The paper provides a thorough investigation of Direct loss (DLM), which optimize the posterior to minimize predictive loss in sparse Gaussian processes.
The application of DLM in non-conjugate cases is more complex because the minimization of expectation in the log-loss DLM objective is often intractable.
arXiv Detail & Related papers (2020-04-07T02:31:00Z) - Online stochastic gradient descent on non-convex losses from
high-dimensional inference [2.2344764434954256]
gradient descent (SGD) is a popular algorithm for optimization problems in high-dimensional tasks.
In this paper we produce an estimator of non-trivial correlation from data.
We illustrate our approach by applying it to a set of tasks such as phase retrieval, and estimation for generalized models.
arXiv Detail & Related papers (2020-03-23T17:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.