On Subspace Approximation and Subset Selection in Fewer Passes by MCMC
Sampling
- URL: http://arxiv.org/abs/2103.11107v1
- Date: Sat, 20 Mar 2021 06:07:30 GMT
- Title: On Subspace Approximation and Subset Selection in Fewer Passes by MCMC
Sampling
- Authors: Amit Deshpande and Rameshwar Pratap
- Abstract summary: Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data.
We propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms.
Our algorithm picks a subset of size $mathrmpoly(k/epsilon)$ that gives $(1+epsilon)$ approximation to the optimal subspace.
- Score: 3.1822873305165618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of subset selection for $\ell_{p}$ subspace
approximation, i.e., given $n$ points in $d$ dimensions, we need to pick a
small, representative subset of the given points such that its span gives
$(1+\epsilon)$ approximation to the best $k$-dimensional subspace that
minimizes the sum of $p$-th powers of distances of all the points to this
subspace. Sampling-based subset selection techniques require adaptive sampling
iterations with multiple passes over the data. Matrix sketching techniques give
a single-pass $(1+\epsilon)$ approximation for $\ell_{p}$ subspace
approximation but require additional passes for subset selection.
In this work, we propose an MCMC algorithm to reduce the number of passes
required by previous subset selection algorithms based on adaptive sampling.
For $p=2$, our algorithm gives subset selection of nearly optimal size in only
$2$ passes, whereas the number of passes required in previous work depend on
$k$. Our algorithm picks a subset of size $\mathrm{poly}(k/\epsilon)$ that
gives $(1+\epsilon)$ approximation to the optimal subspace. The running time of
the algorithm is $nd + d~\mathrm{poly}(k/\epsilon)$. We extend our results to
the case when outliers are present in the datasets, and suggest a two pass
algorithm for the same. Our ideas also extend to give a reduction in the number
of passes required by adaptive sampling algorithms for $\ell_{p}$ subspace
approximation and subset selection, for $p \geq 2$.
Related papers
- Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation.
We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Linear Query Approximation Algorithms for Non-monotone Submodular
Maximization under Knapsack Constraint [16.02833173359407]
This work introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular over a ground set of size $n$ subject to a knapsack constraint.
$mathsfDLA$ is a deterministic algorithm that provides an approximation factor of $6+epsilon$ while $mathsfRLA$ is a randomized algorithm with an approximation factor of $4+epsilon$.
arXiv Detail & Related papers (2023-05-17T15:27:33Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - One-pass additive-error subset selection for $\ell_{p}$ subspace
approximation [6.186553186139257]
We consider the problem of subset selection for $ell_p$ subspace approximation.
We give a one-pass subset selection with an additive approximation guarantee for $ell_p$ subspace approximation.
arXiv Detail & Related papers (2022-04-26T04:51:36Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Subspace approximation with outliers [6.186553186139257]
We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers.
Our results hold even when the fraction of outliers $alpha$ is large, as long as the obvious condition $0 delta leq 1 - alpha$ is satisfied.
arXiv Detail & Related papers (2020-06-30T07:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.