Distributed Estimation for Principal Component Analysis: an Enlarged
Eigenspace Analysis
- URL: http://arxiv.org/abs/2004.02336v3
- Date: Wed, 3 Feb 2021 02:24:02 GMT
- Title: Distributed Estimation for Principal Component Analysis: an Enlarged
Eigenspace Analysis
- Authors: Xi Chen and Jason D. Lee and He Li and Yun Yang
- Abstract summary: This paper studies distributed estimation for a fundamental statistical machine learning problem, principal component analysis (PCA)
We propose a novel multi-round algorithm for constructing top-$L$-dim eigenspace for distributed data.
Our algorithm takes advantage of shift-and-invert preconditioning and convex optimization.
- Score: 45.829683377074524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing size of modern data sets brings many challenges to the existing
statistical estimation approaches, which calls for new distributed
methodologies. This paper studies distributed estimation for a fundamental
statistical machine learning problem, principal component analysis (PCA).
Despite the massive literature on top eigenvector estimation, much less is
presented for the top-$L$-dim ($L>1$) eigenspace estimation, especially in a
distributed manner. We propose a novel multi-round algorithm for constructing
top-$L$-dim eigenspace for distributed data. Our algorithm takes advantage of
shift-and-invert preconditioning and convex optimization. Our estimator is
communication-efficient and achieves a fast convergence rate. In contrast to
the existing divide-and-conquer algorithm, our approach has no restriction on
the number of machines. Theoretically, the traditional Davis-Kahan theorem
requires the explicit eigengap assumption to estimate the top-$L$-dim
eigenspace. To abandon this eigengap assumption, we consider a new route in our
analysis: instead of exactly identifying the top-$L$-dim eigenspace, we show
that our estimator is able to cover the targeted top-$L$-dim population
eigenspace. Our distributed algorithm can be applied to a wide range of
statistical problems based on PCA, such as principal component regression and
single index model. Finally, We provide simulation studies to demonstrate the
performance of the proposed distributed estimator.
Related papers
- Near-Optimal differentially private low-rank trace regression with guaranteed private initialization [0.0]
We study differentially private (DP) estimation of a rank-$r$ matrix $M in RRd_1times d$ under the trace regression model.
We also propose a differentially private algorithm for estimating $M$ based on Riemannian optimization (DP-RGrad)
It is shown that the estimator given by DP-RGrad attains the optimal convergence rate in a weaker notion of differential privacy.
arXiv Detail & Related papers (2024-03-24T03:57:21Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Distributed Learning of Mixtures of Experts [0.0]
We deal with datasets that are either distributed by nature or potentially large for which distributing the computations is usually a standard way to proceed.
We propose a distributed learning approach for mixtures of experts (MoE) models with an aggregation strategy to construct a reduction estimator from local estimators fitted parallelly to distributed subsets of the data.
arXiv Detail & Related papers (2023-12-15T15:26:13Z) - Distributed Semi-Supervised Sparse Statistical Inference [6.685997976921953]
A debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters.
Traditional methods require computing a debiased estimator on every machine.
An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed.
arXiv Detail & Related papers (2023-06-17T17:30:43Z) - Subspace clustering in high-dimensions: Phase transitions \&
Statistical-to-Computational gap [24.073221004661427]
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model.
We provide an exact characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity.
arXiv Detail & Related papers (2022-05-26T17:47:35Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.