Related papers: On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling

On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling

URL: http://arxiv.org/abs/2103.11107v1
Date: Sat, 20 Mar 2021 06:07:30 GMT
Title: On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling
Authors: Amit Deshpande and Rameshwar Pratap
Abstract summary: Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data. We propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms. Our algorithm picks a subset of size $mathrmpoly(k/epsilon)$ that gives $(1+epsilon)$ approximation to the optimal subspace.
Score: 3.1822873305165618
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of subset selection for $\ell_{p}$ subspace approximation, i.e., given $n$ points in $d$ dimensions, we need to pick a small, representative subset of the given points such that its span gives $(1+\epsilon)$ approximation to the best $k$-dimensional subspace that minimizes the sum of $p$-th powers of distances of all the points to this subspace. Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data. Matrix sketching techniques give a single-pass $(1+\epsilon)$ approximation for $\ell_{p}$ subspace approximation but require additional passes for subset selection. In this work, we propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms based on adaptive sampling. For $p=2$, our algorithm gives subset selection of nearly optimal size in only $2$ passes, whereas the number of passes required in previous work depend on $k$. Our algorithm picks a subset of size $\mathrm{poly}(k/\epsilon)$ that gives $(1+\epsilon)$ approximation to the optimal subspace. The running time of the algorithm is $nd + d~\mathrm{poly}(k/\epsilon)$. We extend our results to the case when outliers are present in the datasets, and suggest a two pass algorithm for the same. Our ideas also extend to give a reduction in the number of passes required by adaptive sampling algorithms for $\ell_{p}$ subspace approximation and subset selection, for $p \geq 2$.

Related papers

Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation. We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint [16.02833173359407]
This work introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular over a ground set of size $n$ subject to a knapsack constraint. $mathsfDLA$ is a deterministic algorithm that provides an approximation factor of $6+epsilon$ while $mathsfRLA$ is a randomized algorithm with an approximation factor of $4+epsilon$.
arXiv Detail & Related papers (2023-05-17T15:27:33Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
One-pass additive-error subset selection for $\ell_{p}$ subspace approximation [6.186553186139257]
We consider the problem of subset selection for $ell_p$ subspace approximation. We give a one-pass subset selection with an additive approximation guarantee for $ell_p$ subspace approximation.
arXiv Detail & Related papers (2022-04-26T04:51:36Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation. By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Subspace approximation with outliers [6.186553186139257]
We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. Our results hold even when the fraction of outliers $alpha$ is large, as long as the obvious condition $0 delta leq 1 - alpha$ is satisfied.
arXiv Detail & Related papers (2020-06-30T07:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.