Covariance Estimators for the ROOT-SGD Algorithm in Online Learning
- URL: http://arxiv.org/abs/2212.01259v1
- Date: Fri, 2 Dec 2022 15:55:52 GMT
- Title: Covariance Estimators for the ROOT-SGD Algorithm in Online Learning
- Authors: Yiling Luo, Xiaoming Huo, Yajun Mei
- Abstract summary: We develop two estimators for the algorithm covariance of ROOT-SGD.
Our first estimator adopts the idea of plug-in. For each unknown component in the formula of the covariance, we substitute it with its empirical counterpart.
Our second estimator is a Hessian-free estimator that overcomes the limitation.
- Score: 7.00422423634143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online learning naturally arises in many statistical and machine learning
problems. The most widely used methods in online learning are stochastic
first-order algorithms. Among this family of algorithms, there is a recently
developed algorithm, Recursive One-Over-T SGD (ROOT-SGD). ROOT-SGD is
advantageous in that it converges at a non-asymptotically fast rate, and its
estimator further converges to a normal distribution. However, this normal
distribution has unknown asymptotic covariance; thus cannot be directly applied
to measure the uncertainty. To fill this gap, we develop two estimators for the
asymptotic covariance of ROOT-SGD. Our covariance estimators are useful for
statistical inference in ROOT-SGD. Our first estimator adopts the idea of
plug-in. For each unknown component in the formula of the asymptotic
covariance, we substitute it with its empirical counterpart. The plug-in
estimator converges at the rate $\mathcal{O}(1/\sqrt{t})$, where $t$ is the
sample size. Despite its quick convergence, the plug-in estimator has the
limitation that it relies on the Hessian of the loss function, which might be
unavailable in some cases. Our second estimator is a Hessian-free estimator
that overcomes the aforementioned limitation. The Hessian-free estimator uses
the random-scaling technique, and we show that it is an asymptotically
consistent estimator of the true covariance.
Related papers
- Markov Chain Variance Estimation: A Stochastic Approximation Approach [14.883782513177094]
We consider the problem of estimating the variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean.
We design a novel recursive estimator that requires $O(1)$ at each step, does not require any historical samples or any prior knowledge of run-length, and has optimal $O(frac1n) rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees.
arXiv Detail & Related papers (2024-09-09T15:42:28Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent
Estimator [0.0]
In this paper, we investigate the theoretical properties of gradient descent (SGD) for statistical inference in the context of convex problems.
We propose two coferential procedures which may contain multiple error minima.
arXiv Detail & Related papers (2023-06-03T22:08:10Z) - Statistical Inference with Stochastic Gradient Methods under
$\phi$-mixing Data [9.77185962310918]
We propose a mini-batch SGD estimator for statistical inference when the data is $phi$-mixing.
The confidence intervals are constructed using an associated mini-batch SGD procedure.
The proposed method is memory-efficient and easy to implement in practice.
arXiv Detail & Related papers (2023-02-24T16:16:43Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Statistical Inference for Model Parameters in Stochastic Gradient
Descent [45.29532403359099]
gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency.
We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
arXiv Detail & Related papers (2016-10-27T07:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.