New Coresets for Projective Clustering and Applications
- URL: http://arxiv.org/abs/2203.04370v1
- Date: Tue, 8 Mar 2022 19:50:27 GMT
- Title: New Coresets for Projective Clustering and Applications
- Authors: Murad Tukan and Xuan Wu and Samson Zhou and Vladimir Braverman and Dan
Feldman
- Abstract summary: Given a set of points $P$ in $mathbbRd$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure.
We show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_infty$, and Fair regression.
- Score: 34.82221047030618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: $(j,k)$-projective clustering is the natural generalization of the family of
$k$-clustering and $j$-subspace clustering problems. Given a set of points $P$
in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine
subspaces, that best fit $P$ under a given distance measure. In this paper, we
propose the first algorithm that returns an $L_\infty$ coreset of size
polynomial in $d$. Moreover, we give the first strong coreset construction for
general $M$-estimator regression. Specifically, we show that our construction
provides efficient coreset constructions for Cauchy, Welsch, Huber,
Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general
concave and power-bounded loss functions. Finally, we provide experimental
results based on real-world datasets, showing the efficacy of our approach.
Related papers
- Clustering to Minimize Cluster-Aware Norm Objectives [0.3481985817302898]
We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers.
The cost of a cluster, represented by a center $xin X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$.
The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs.
arXiv Detail & Related papers (2024-10-31T16:33:40Z) - Multilayer Correlation Clustering [12.492037397168579]
We establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting.
In this paper, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$.
The goal is then to find a clustering of $V$ that minimizes the $ell_p$-norm ($pgeq 1$) of the disagreements vector.
arXiv Detail & Related papers (2024-04-25T15:25:30Z) - A Unified Framework for Gradient-based Clustering of Distributed Data [51.904327888475606]
We develop a family of distributed clustering algorithms that work over networks of users.
DGC-$mathcalF_rho$ is specialized to popular clustering losses like $K$-means and Huber loss.
We show that consensus fixed points of DGC-$mathcalF_rho$ are equivalent to fixed points of gradient clustering over the full data.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Universal Weak Coreset [3.1509756165776635]
A weak coreset is a pair $(J,S)$ of subsets of points, where $S$ acts as a summary of the point set and $J$ as a set of potential centers.
We develop this framework, which we call universal weak coresets, for constrained clustering settings.
arXiv Detail & Related papers (2023-05-26T12:51:16Z) - Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces [2.687607197645453]
We show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
We also show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
arXiv Detail & Related papers (2023-05-12T08:43:28Z) - Randomized Dimensionality Reduction for Facility Location and
Single-Linkage Clustering [13.208510864854894]
Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems.
We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem.
arXiv Detail & Related papers (2021-07-05T05:55:26Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Deep Learning Meets Projective Clustering [66.726500395069]
A common approach for compressing NLP networks is to encode the embedding layer as a matrix $AinmathbbRntimes d$.
Inspired by emphprojective clustering from computational geometry, we suggest replacing this subspace by a set of $k$ subspaces.
arXiv Detail & Related papers (2020-10-08T22:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.