Related papers: Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

URL: http://arxiv.org/abs/2005.05195v4
Date: Wed, 25 Aug 2021 15:42:09 GMT
Title: Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet
Abstract summary: Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2%$ in practice within minutes for $p=100$s or hours for $p=1,000$s.
Score: 3.179831861897336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k=5 covariates from p=300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours for $p=1,000$s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical datasets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.

Related papers

The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
We study the sample complexity of the plug-in approach for learning $varepsilon$-optimal policies in average-reward Markov decision processes. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed.
arXiv Detail & Related papers (2024-10-10T05:08:14Z)
Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications [19.714951004096996]
We analyze black-box deflation methods as a framework for designing $k$-PCA algorithms. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination.
arXiv Detail & Related papers (2024-03-06T18:07:20Z)
Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model. We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Scalable Differentially Private Clustering via Hierarchically Separated Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee. Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z)
Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions. We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism. We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems. We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z)
Sparse online variational Bayesian regression [0.0]
variational Bayesian inference as an inexpensive and scalable alternative to a fully Bayesian approach. For linear models the method requires only the iterative solution of deterministic least squares problems. For large p an approximation is able to achieve promising results for a cost of O(p) in both computation and memory.
arXiv Detail & Related papers (2021-02-24T12:49:42Z)
A New Basis for Sparse Principal Component Analysis [5.258928416164104]
Previous versions of sparse principal component analysis presumed that the eigen-basis (a $p times k$ matrix) is approximately sparse. We propose a method that presumes the $p times k$ matrix becomes approximately sparse after a $k times k$ rotation. We show that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods.
arXiv Detail & Related papers (2020-07-01T16:32:22Z)
FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.