Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs
- URL: http://arxiv.org/abs/2112.09055v1
- Date: Thu, 16 Dec 2021 17:52:04 GMT
- Title: Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs
- Authors: Bogdan-Adrian Manghiuc and He Sun
- Abstract summary: We present two-time approximation algorithms for hierarchical clustering.
The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets.
- Score: 3.2901541059183432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical clustering studies a recursive partition of a data set into
clusters of successively smaller size, and is a fundamental problem in data
analysis. In this work we study the cost function for hierarchical clustering
introduced by Dasgupta, and present two polynomial-time approximation
algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of
high conductance. Our simple construction bypasses complicated recursive
routines of finding sparse cuts known in the literature. Our second and main
result is an $O(1)$-approximation algorithm for a wide family of graphs that
exhibit a well-defined structure of clusters. This result generalises the
previous state-of-the-art, which holds only for graphs generated from
stochastic models. The significance of our work is demonstrated by the
empirical analysis on both synthetic and real-world data sets, on which our
presented algorithm outperforms the previously proposed algorithm for graphs
with a well-defined cluster structure.
Related papers
- Dynamic Spectral Clustering with Provable Approximation Guarantee [7.6676757797831225]
The paper proves that, under some mild condition on the cluster-structure, the clusters of the final graph $G_T$ can be well approximated by a dynamic variant of the spectral clustering algorithm.
Experimental studies on both synthetic and real-world datasets further confirm the practicality of our designed algorithm.
arXiv Detail & Related papers (2024-06-05T11:16:55Z) - A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs.
Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z) - Fast Approximation of Similarity Graphs with Kernel Density Estimation [12.321755440062732]
We present a new algorithm for constructing a similarity graph from a set $X$ of data points in $mathbbRd$.
Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions.
arXiv Detail & Related papers (2023-10-21T00:32:47Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph
Convolution [79.44066256794187]
We show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build graph convolution networks (GCNs)
A bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data.
We theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.
arXiv Detail & Related papers (2021-11-12T07:08:13Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Data Structures & Algorithms for Exact Inference in Hierarchical
Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure.
Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.