Statistical Inference with Stochastic Gradient Methods under
$\phi$-mixing Data
- URL: http://arxiv.org/abs/2302.12717v2
- Date: Tue, 28 Mar 2023 14:35:04 GMT
- Title: Statistical Inference with Stochastic Gradient Methods under
$\phi$-mixing Data
- Authors: Ruiqi Liu, Xi Chen, Zuofeng Shang
- Abstract summary: We propose a mini-batch SGD estimator for statistical inference when the data is $phi$-mixing.
The confidence intervals are constructed using an associated mini-batch SGD procedure.
The proposed method is memory-efficient and easy to implement in practice.
- Score: 9.77185962310918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent (SGD) is a scalable and memory-efficient
optimization algorithm for large datasets and stream data, which has drawn a
great deal of attention and popularity. The applications of SGD-based
estimators to statistical inference such as interval estimation have also
achieved great success. However, most of the related works are based on i.i.d.
observations or Markov chains. When the observations come from a mixing time
series, how to conduct valid statistical inference remains unexplored. As a
matter of fact, the general correlation among observations imposes a challenge
on interval estimation. Most existing methods may ignore this correlation and
lead to invalid confidence intervals. In this paper, we propose a mini-batch
SGD estimator for statistical inference when the data is $\phi$-mixing. The
confidence intervals are constructed using an associated mini-batch bootstrap
SGD procedure. Using ``independent block'' trick from \cite{yu1994rates}, we
show that the proposed estimator is asymptotically normal, and its limiting
distribution can be effectively approximated by the bootstrap procedure. The
proposed method is memory-efficient and easy to implement in practice.
Simulation studies on synthetic data and an application to a real-world dataset
confirm our theory.
Related papers
- On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Overlapping Batch Confidence Intervals on Statistical Functionals
Constructed from Time Series: Application to Quantiles, Optimization, and
Estimation [5.068678962285631]
We propose a confidence interval procedure for statistical functionals constructed using data from a stationary time series.
The OBx limits, certain functionals of the Wiener process parameterized by the size of the batches and the extent of their overlap, form the essential machinery for characterizing dependence.
arXiv Detail & Related papers (2023-07-17T16:21:48Z) - Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent
Estimator [0.0]
In this paper, we investigate the theoretical properties of gradient descent (SGD) for statistical inference in the context of convex problems.
We propose two coferential procedures which may contain multiple error minima.
arXiv Detail & Related papers (2023-06-03T22:08:10Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Distributed Learning of Finite Gaussian Mixtures [21.652015112462]
We study split-and-conquer approaches for the distributed learning of finite Gaussian mixtures.
New estimator is shown to be consistent and retains root-n consistency under some general conditions.
Experiments based on simulated and real-world data show that the proposed split-and-conquer approach has comparable statistical performance with the global estimator.
arXiv Detail & Related papers (2020-10-20T16:17:47Z) - Minimax Quasi-Bayesian estimation in sparse canonical correlation
analysis via a Rayleigh quotient function [1.0878040851638]
Existing rate-optimal estimators for sparse canonical vectors have high computational cost.
We propose a quasi-Bayesian estimation procedure that achieves the minimax estimation rate.
We use the proposed methodology to maximally correlate clinical variables and proteomic data for better understanding the Covid-19 disease.
arXiv Detail & Related papers (2020-10-16T21:00:57Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Statistical Inference for Model Parameters in Stochastic Gradient
Descent [45.29532403359099]
gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency.
We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
arXiv Detail & Related papers (2016-10-27T07:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.