Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
- URL: http://arxiv.org/abs/2005.05195v4
- Date: Wed, 25 Aug 2021 15:42:09 GMT
- Title: Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
- Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet
- Abstract summary: Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables.
By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality.
We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2%$ in practice within minutes for $p=100$s or hours for $p=1,000$s.
- Score: 3.179831861897336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse principal component analysis (PCA) is a popular dimensionality
reduction technique for obtaining principal components which are linear
combinations of a small subset of the original features. Existing approaches
cannot supply certifiably optimal principal components with more than $p=100s$
of variables. By reformulating sparse PCA as a convex mixed-integer
semidefinite optimization problem, we design a cutting-plane method which
solves the problem to certifiable optimality at the scale of selecting k=5
covariates from p=300 variables, and provides small bound gaps at a larger
scale. We also propose a convex relaxation and greedy rounding scheme that
provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours
for $p=1,000$s and is therefore a viable alternative to the exact method at
scale. Using real-world financial and medical datasets, we illustrate our
approach's ability to derive interpretable principal components tractably at
scale.
Related papers
- The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
We study the sample complexity of the plug-in approach for learning $varepsilon$-optimal policies in average-reward Markov decision processes.
Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed.
arXiv Detail & Related papers (2024-10-10T05:08:14Z) - Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications [19.714951004096996]
We analyze black-box deflation methods as a framework for designing $k$-PCA algorithms.
Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA.
We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination.
arXiv Detail & Related papers (2024-03-06T18:07:20Z) - Sparse Gaussian Graphical Models with Discrete Optimization:
Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model.
We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function.
Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z) - Sparse online variational Bayesian regression [0.0]
variational Bayesian inference as an inexpensive and scalable alternative to a fully Bayesian approach.
For linear models the method requires only the iterative solution of deterministic least squares problems.
For large p an approximation is able to achieve promising results for a cost of O(p) in both computation and memory.
arXiv Detail & Related papers (2021-02-24T12:49:42Z) - A New Basis for Sparse Principal Component Analysis [5.258928416164104]
Previous versions of sparse principal component analysis presumed that the eigen-basis (a $p times k$ matrix) is approximately sparse.
We propose a method that presumes the $p times k$ matrix becomes approximately sparse after a $k times k$ rotation.
We show that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods.
arXiv Detail & Related papers (2020-07-01T16:32:22Z) - FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems.
We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.