Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications
- URL: http://arxiv.org/abs/2403.03905v3
- Date: Tue, 11 Jun 2024 05:43:50 GMT
- Title: Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications
- Authors: Arun Jambulapati, Syamantak Kumar, Jerry Li, Shourya Pandey, Ankit Pensia, Kevin Tian,
- Abstract summary: We analyze black-box deflation methods as a framework for designing $k$-PCA algorithms.
Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA.
We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination.
- Score: 19.714951004096996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing $k$-PCA algorithms, where we model access to the unknown target matrix via a black-box $1$-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to $k$-PCA algorithm design, such black-box methods, which recursively call a $1$-PCA oracle $k$ times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, $k$-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant $k$. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a $\mathsf{poly}(k)$ factor.
Related papers
- Combinatorial Sparse PCA Beyond the Spiked Identity Model [25.073817053937802]
We present a method for sparse PCA that provably succeeds for general $$ using $s2 cdot mathrmpolylog(d)$ samples and $d2 cdot mathrmpoly(s)$ time.<n>We also evaluate our method on synthetic and real-world datasets.
arXiv Detail & Related papers (2026-03-03T05:19:55Z) - Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization [85.91302339486673]
We study robust Markov decision processes (RMDPs) with general policy parameterization under s-rectangular and non-rectangular uncertainty sets.<n>We prove novel Lipschitz and Lipschitz-smoothness properties for general policy parameterizations that extends to infinite state spaces.<n>We design a projected gradient descent algorithm for s-rectangular uncertainty and a Frank-Wolfe algorithm for non-rectangular uncertainty.
arXiv Detail & Related papers (2026-02-11T21:44:20Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs [56.246783503873225]
This paper revisits the weighted strategy for non-stationary parametric bandits.<n>We propose a simpler weight-based algorithm that is as efficient as window/restart-based algorithms.<n>Our framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2026-01-03T04:50:21Z) - Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z) - The $φ$-PCA Framework: A Unified and Efficiency-Preserving Approach with Robust Variants [0.0]
We introduce the $phi$-PCA framework which provides a unified formulation of robust and distributed PCA.<n>We show that the partition-aggregation principle underlying $phi$-PCA offers a general strategy for developing robust and efficiency-preserving methodologies.
arXiv Detail & Related papers (2025-10-15T05:21:11Z) - Solve sparse PCA problem by employing Hamiltonian system and leapfrog method [0.0]
We propose a novel sparse PCA algorithm that imposes sparsity through a smooth L1 penalty.
Experimental evaluations on a face recognition dataset-using both k-nearest neighbor and kernel ridge regressions-demonstrate that the proposed sparse PCA methods consistently achieve higher classification accuracy than conventional PCA.
arXiv Detail & Related papers (2025-03-30T06:39:11Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.
We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - A Theoretical Analysis of Optimistic Proximal Policy Optimization in
Linear Markov Decision Processes [13.466249082564213]
We propose an optimistic variant of PPO for episodic adversarial linear MDPs with full-information feedback.
Compared with existing policy-based algorithms, we achieve the state-of-the-art regret bound in both linear MDPs and adversarial linear MDPs with full information.
arXiv Detail & Related papers (2023-05-15T17:55:24Z) - Fair principal component analysis (PCA): minorization-maximization
algorithms for Fair PCA, Fair Robust PCA and Fair Sparse PCA [6.974999794070285]
We propose a new iterative algorithm to solve the fair PCA (FPCA) problem.
The proposed algorithm relies on the relaxation of a semi-orthogonality constraint which is proved to be tight at every iteration of the algorithm.
We numerically compare the performance of the proposed methods with two of the state-of-the-art approaches on synthetic data sets and a real-life data set.
arXiv Detail & Related papers (2023-05-10T08:14:32Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - AgFlow: Fast Model Selection of Penalized PCA via Implicit
Regularization Effects of Gradient Flow [64.81110234990888]
Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction.
In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings.
We propose Approximated Gradient Flow (AgFlow) as a fast model selection method for penalized PCA.
arXiv Detail & Related papers (2021-10-07T08:57:46Z) - An Online Riemannian PCA for Stochastic Canonical Correlation Analysis [37.8212762083567]
We present an efficient algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices.
While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising.
arXiv Detail & Related papers (2021-06-08T23:38:29Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality [3.179831861897336]
Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables.
By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality.
We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2%$ in practice within minutes for $p=100$s or hours for $p=1,000$s.
arXiv Detail & Related papers (2020-05-11T15:39:23Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.