Related papers: Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

URL: http://arxiv.org/abs/2112.09055v1
Date: Thu, 16 Dec 2021 17:52:04 GMT
Title: Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs
Authors: Bogdan-Adrian Manghiuc and He Sun
Abstract summary: We present two-time approximation algorithms for hierarchical clustering. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets.
Score: 3.2901541059183432
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an $O(1)$-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms [14.601622103700516]
Clustering nodes of a graph is a cornerstone of graph analysis. Some popular methods are not suitable for very large graphs. This work introduces PASCO, an overlay that accelerates clustering algorithms.
arXiv Detail & Related papers (2024-12-18T08:15:55Z)
Dynamic Spectral Clustering with Provable Approximation Guarantee [7.6676757797831225]
The paper proves that, under some mild condition on the cluster-structure, the clusters of the final graph $G_T$ can be well approximated by a dynamic variant of the spectral clustering algorithm. Experimental studies on both synthetic and real-world datasets further confirm the practicality of our designed algorithm.
arXiv Detail & Related papers (2024-06-05T11:16:55Z)
A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z)
Fast Approximation of Similarity Graphs with Kernel Density Estimation [12.321755440062732]
We present a new algorithm for constructing a similarity graph from a set $X$ of data points in $mathbbRd$. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions.
arXiv Detail & Related papers (2023-10-21T00:32:47Z)
Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification. By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
One-step Bipartite Graph Cut: A Normalized Formulation and Its Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity. In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem. We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z)
Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering. We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset. We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z)
AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph Convolution [79.44066256794187]
We show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build graph convolution networks (GCNs) A bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data. We theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.
arXiv Detail & Related papers (2021-11-12T07:08:13Z)
Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering. Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm. Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
Data Structures & Algorithms for Exact Inference in Hierarchical Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure. Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.