Bootstrapping the error of Oja's Algorithm
- URL: http://arxiv.org/abs/2106.14857v1
- Date: Mon, 28 Jun 2021 17:27:26 GMT
- Title: Bootstrapping the error of Oja's Algorithm
- Authors: Robert Lunde, Purnamrita Sarkar, Rachel Ward
- Abstract summary: We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis.
We propose a multiplier bootstrap algorithm that may be updated in an online manner.
- Score: 16.017328736786922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of quantifying uncertainty for the estimation error
of the leading eigenvector from Oja's algorithm for streaming principal
component analysis, where the data are generated IID from some unknown
distribution. By combining classical tools from the U-statistics literature
with recent results on high-dimensional central limit theorems for quadratic
forms of random vectors and concentration of matrix products, we establish a
$\chi^2$ approximation result for the $\sin^2$ error between the population
eigenvector and the output of Oja's algorithm. Since estimating the covariance
matrix associated with the approximating distribution requires knowledge of
unknown model parameters, we propose a multiplier bootstrap algorithm that may
be updated in an online manner. We establish conditions under which the
bootstrap distribution is close to the corresponding sampling distribution with
high probability, thereby establishing the bootstrap as a consistent
inferential method in an appropriate asymptotic regime.
Related papers
- Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space [8.971989179518216]
Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries.
We utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting.
arXiv Detail & Related papers (2024-10-17T00:50:17Z) - Dynamical System Identification, Model Selection and Model Uncertainty Quantification by Bayesian Inference [0.8388591755871735]
This study presents a Bayesian maximum textitaposteriori (MAP) framework for dynamical system identification from time-series data.
arXiv Detail & Related papers (2024-01-30T12:16:52Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Trimmed Sampling Algorithm for the Noisy Generalized Eigenvalue Problem [0.0]
Solving the generalized eigenvalue problem is a useful method for finding energy eigenstates of large quantum systems.
The problem is especially bad when matrix elements are evaluated using methods and have significant error bars.
We introduce the trimmed sampling algorithm in order to solve this problem.
arXiv Detail & Related papers (2022-09-05T18:10:12Z) - Estimating Gaussian Copulas with Missing Data [0.0]
We show how to circumvent a priori assumptions on the marginals with semiparametric modelling.
The joint distribution learned through this algorithm is considerably closer to the underlying distribution than existing methods.
arXiv Detail & Related papers (2022-01-14T17:20:44Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Upper Bounds on the Generalization Error of Private Algorithms for
Discrete Data [31.122671977370416]
We study the generalization capability of algorithms from an information-theoretic perspective.
In particular, we show the bounds obtained with this strategy for the case of $epsilon$-DP and $mu$-GDP algorithms.
arXiv Detail & Related papers (2020-05-12T16:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.