Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent
Estimator
- URL: http://arxiv.org/abs/2306.02205v1
- Date: Sat, 3 Jun 2023 22:08:10 GMT
- Title: Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent
Estimator
- Authors: Yanjie Zhong, Todd Kuffner and Soumendra Lahiri
- Abstract summary: In this paper, we investigate the theoretical properties of gradient descent (SGD) for statistical inference in the context of convex problems.
We propose two coferential procedures which may contain multiple error minima.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the theoretical properties of stochastic
gradient descent (SGD) for statistical inference in the context of nonconvex
optimization problems, which have been relatively unexplored compared to convex
settings. Our study is the first to establish provable inferential procedures
using the SGD estimator for general nonconvex objective functions, which may
contain multiple local minima.
We propose two novel online inferential procedures that combine SGD and the
multiplier bootstrap technique. The first procedure employs a consistent
covariance matrix estimator, and we establish its error convergence rate. The
second procedure approximates the limit distribution using bootstrap SGD
estimators, yielding asymptotically valid bootstrap confidence intervals. We
validate the effectiveness of both approaches through numerical experiments.
Furthermore, our analysis yields an intermediate result: the in-expectation
error convergence rate for the original SGD estimator in nonconvex settings,
which is comparable to existing results for convex problems. We believe this
novel finding holds independent interest and enriches the literature on
optimization and statistical inference.
Related papers
- Scalable method for Bayesian experimental design without integrating
over posterior distribution [0.0]
We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems.
A-optimality is a widely used and easy-to-interpret criterion for Bayesian experimental design.
This study presents a novel likelihood-free approach to the A-optimal experimental design.
arXiv Detail & Related papers (2023-06-30T12:40:43Z) - Statistical Inference with Stochastic Gradient Methods under
$\phi$-mixing Data [9.77185962310918]
We propose a mini-batch SGD estimator for statistical inference when the data is $phi$-mixing.
The confidence intervals are constructed using an associated mini-batch SGD procedure.
The proposed method is memory-efficient and easy to implement in practice.
arXiv Detail & Related papers (2023-02-24T16:16:43Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Computationally Efficient and Statistically Optimal Robust Low-rank
Matrix and Tensor Estimation [15.389011827844572]
Low-rank linear shrinkage estimation undertailed noise is challenging, both computationally statistically.
We introduce a novel sub-ient (RsGrad) which is not only computationally efficient but also statistically optimal.
arXiv Detail & Related papers (2022-03-02T09:05:15Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Non-Asymptotic Performance Guarantees for Neural Estimation of
$\mathsf{f}$-Divergences [22.496696555768846]
Statistical distances quantify the dissimilarity between probability distributions.
A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it.
This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs.
arXiv Detail & Related papers (2021-03-11T19:47:30Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.