Related papers: Guessing Efficiently for Constrained Subspace Approximation

Guessing Efficiently for Constrained Subspace Approximation

URL: http://arxiv.org/abs/2504.20883v1
Date: Tue, 29 Apr 2025 15:56:48 GMT
Title: Guessing Efficiently for Constrained Subspace Approximation
Authors: Aditya Bhaskara, Sepideh Mahabadi, Madhusudhan Reddy Pittu, Ali Vakilian, David P. Woodruff,
Abstract summary: We introduce a general framework for constrained subspace approximation.<n>We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
Score: 49.83981776254246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper we study constrained subspace approximation problem. Given a set of $n$ points $\{a_1,\ldots,a_n\}$ in $\mathbb{R}^d$, the goal of the {\em subspace approximation} problem is to find a $k$ dimensional subspace that best approximates the input points. More precisely, for a given $p\geq 1$, we aim to minimize the $p$th power of the $\ell_p$ norm of the error vector $(\|a_1-\bm{P}a_1\|,\ldots,\|a_n-\bm{P}a_n\|)$, where $\bm{P}$ denotes the projection matrix onto the subspace and the norms are Euclidean. In \emph{constrained} subspace approximation (CSA), we additionally have constraints on the projection matrix $\bm{P}$. In its most general form, we require $\bm{P}$ to belong to a given subset $\mathcal{S}$ that is described explicitly or implicitly. We introduce a general framework for constrained subspace approximation. Our approach, that we term coreset-guess-solve, yields either $(1+\varepsilon)$-multiplicative or $\varepsilon$-additive approximations for a variety of constraints. We show that it provides new algorithms for partition-constrained subspace approximation with applications to {\it fair} subspace approximation, $k$-means clustering, and projected non-negative matrix factorization, among others. Specifically, while we reconstruct the best known bounds for $k$-means clustering in Euclidean spaces, we improve the known results for the remainder of the problems.

Related papers

$k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation [16.942733472657622]
Given an integer $kgeq1$ and a set $P$ of $n$ points in $REALd$, the classic approximation $k$-PCA approximates affinemph$fty distances.<n>Open code and experimental results on real-world datasets are also provided.
arXiv Detail & Related papers (2025-07-19T14:00:50Z)
Optimal Oblivious Subspace Embeddings with Near-optimal Sparsity [3.9657575162895196]
An oblivious subspace embedding is a random $mtimes n$ matrix $Pi$ that preserves the norms of all vectors in that subspace within a $1pmepsilon$ factor.<n>We give an oblivious subspace embedding with the optimal dimension $m=Theta(d/epsilon2)$ that has a near-optimal sparsity of $tilde O (1/epsilon)$ non-zero entries per column of $Pi$.
arXiv Detail & Related papers (2024-11-13T16:58:51Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Ridge Leverage Score Sampling for $\ell_p$ Subspace Approximation [47.790126028106734]
A popular approach to cope with the NP-hardness is to compute a strong coreset.<n>We obtain an algorithm for constructing a strong coreset for $ell_p$ subspace approximation of size $tilde O(kepsilon-4/p)$ for $p2$ and $tilde O(kp/2epsilon-p)$ for $p>2$.
arXiv Detail & Related papers (2024-07-03T16:49:28Z)
Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces [2.687607197645453]
We show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms. We also show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
arXiv Detail & Related papers (2023-05-12T08:43:28Z)
One-pass additive-error subset selection for $\ell_{p}$ subspace approximation [6.186553186139257]
We consider the problem of subset selection for $ell_p$ subspace approximation. We give a one-pass subset selection with an additive approximation guarantee for $ell_p$ subspace approximation.
arXiv Detail & Related papers (2022-04-26T04:51:36Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Approximate Maximum Halfspace Discrepancy [6.35821487778241]
We consider the range space $(X, mathcalH_d)$ where $X subset mathbbRd$ and $mathcalH_d$ is the set of ranges defined by $d$ halfspaces. For each halfspace $h in mathcalH_d$ define a function $Phi(h)$ that measures the "difference" between the fraction of red and fraction of blue points which fall in the range $h$.
arXiv Detail & Related papers (2021-06-25T19:14:45Z)
Subspace approximation with outliers [6.186553186139257]
We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. Our results hold even when the fraction of outliers $alpha$ is large, as long as the obvious condition $0 delta leq 1 - alpha$ is satisfied.
arXiv Detail & Related papers (2020-06-30T07:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.