Exact Recovery of Mangled Clusters with Same-Cluster Queries
- URL: http://arxiv.org/abs/2006.04675v3
- Date: Fri, 30 Oct 2020 13:07:19 GMT
- Title: Exact Recovery of Mangled Clusters with Same-Cluster Queries
- Authors: Marco Bressan, Nicol\`o Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice
- Abstract summary: We study the cluster recovery problem in the semi-supervised active clustering framework.
We design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k3 ln k ln n)$ oracle queries and $tildeO(kn + k3)$ time to recover the clustering with zero misclassification error.
- Score: 20.03712152278538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the cluster recovery problem in the semi-supervised active
clustering framework. Given a finite set of input points, and an oracle
revealing whether any two points lie in the same cluster, our goal is to
recover all clusters exactly using as few queries as possible. To this end, we
relax the spherical $k$-means cluster assumption of Ashtiani et al.\ to allow
for arbitrary ellipsoidal clusters with margin. This removes the assumption
that the clustering is center-based (i.e., defined through an optimization
problem), and includes all those cases where spherical clusters are
individually transformed by any combination of rotations, axis scalings, and
point deletions. We show that, even in this much more general setting, it is
still possible to recover the latent clustering exactly using a number of
queries that scales only logarithmically with the number of input points. More
precisely, we design an algorithm that, given $n$ points to be partitioned into
$k$ clusters, uses $O(k^3 \ln k \ln n)$ oracle queries and $\tilde{O}(kn +
k^3)$ time to recover the clustering with zero misclassification error. The
$O(\cdot)$ notation hides an exponential dependence on the dimensionality of
the clusters, which we show to be necessary thus characterizing the query
complexity of the problem. Our algorithm is simple, easy to implement, and can
also learn the clusters using low-stretch separators, a class of ellipsoids
with additional theoretical guarantees. Experiments on large synthetic datasets
confirm that we can reconstruct clusterings exactly and efficiently.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - How to Find a Good Explanation for Clustering? [7.951746797489421]
Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering.
We study two natural algorithmic questions about explainable clustering.
Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.
arXiv Detail & Related papers (2021-12-13T11:48:38Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - On Margin-Based Cluster Recovery with Oracle Queries [22.672233769934845]
We study an active cluster recovery problem where, given a set of $n$ points oracle and an answering queries like "are these two points in the same cluster?"
We give an algorithm that recovers arbitrary convex clusters in exactly time, and with a number of queries that is lower than the best existing algorithm by $Theta(mm)$ factors.
For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the $O(log n)$ query bound, and is provably near optimal.
arXiv Detail & Related papers (2021-06-09T08:48:23Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles.
We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z) - Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries [22.672233769934845]
We investigate the problem of exact cluster dependence using oracle queries.
We show that if we are allowed to make $(k2 + k )$, an additional queries can recover different different and arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary arbitrary
arXiv Detail & Related papers (2021-01-31T18:00:29Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.