Hierarchical Clustering via Local Search
- URL: http://arxiv.org/abs/2405.15983v1
- Date: Fri, 24 May 2024 23:46:24 GMT
- Title: Hierarchical Clustering via Local Search
- Authors: Hossein Jowhari,
- Abstract summary: We introduce a local search algorithm for hierarchical clustering.
We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a local search algorithm for hierarchical clustering. For the local step, we consider a tree re-arrangement operation, known as the {\em interchange}, which involves swapping two closely positioned sub-trees within a tree hierarchy. The interchange operation has been previously used in the context of phylogenetic trees. As the objective function for evaluating the resulting hierarchies, we utilize the revenue function proposed by Moseley and Wang (NIPS 2017.) In our main result, we show that any locally optimal tree guarantees a revenue of at least $\frac{n-2}{3}\sum_{i < j}w(i,j)$ where is $n$ the number of objects and $w: [n] \times [n] \rightarrow \mathbb{R}^+$ is the associated similarity function. This finding echoes the previously established bound for the average link algorithm as analyzed by Moseley and Wang. We demonstrate that this alignment is not coincidental, as the average link trees enjoy the property of being locally optimal with respect to the interchange operation. Consequently, our study provides an alternative insight into the average link algorithm and reveals the existence of a broader range of hierarchies with relatively high revenue achievable through a straightforward local search algorithm. Furthermore, we present an implementation of the local search framework, where each local step requires $O(n)$ computation time. Our empirical results indicate that the proposed method, used as post-processing step, can effectively generate a hierarchical clustering with substantial revenue.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - When Does Bottom-up Beat Top-down in Hierarchical Community Detection? [8.097200145973389]
Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures.
Agglomerative ($textitbottom-up$) algorithms first identify the smallest community structure and then repeatedly merge the communities using a $textitlinkage$ method.
We establish theoretical guarantees for the recovery of the hierarchical tree and community structure of a Hierarchical Block Model by a bottom-up algorithm.
arXiv Detail & Related papers (2023-06-01T15:55:46Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - A Metaheuristic Algorithm for Large Maximum Weight Independent Set
Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum.
Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges.
We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z) - An Information-theoretic Perspective of Hierarchical Clustering [30.896561720088954]
A cost function for hierarchical clustering was introduced by Dasgupta citedasgupta2016cost.
In this paper, we investigate hierarchical clustering from the emphinformation-theoretic perspective and formulate a new objective function.
arXiv Detail & Related papers (2021-08-13T03:03:56Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Combinatorial and computational investigations of Neighbor-Joining bias [0.0]
The Neighbor-Joining algorithm computes a tree metric from a dissimilarity map arising from biological data.
A full description of these regions has not been found yet.
Different sequences of Neighbor-Joining agglomeration events can produce the same tree, therefore associating multiple geometric regions to the same output.
arXiv Detail & Related papers (2020-07-18T06:48:45Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z) - Data Structures & Algorithms for Exact Inference in Hierarchical
Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure.
Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z) - Scalable Hierarchical Clustering with Tree Grafting [66.68869706310208]
Grinch is a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions.
Grinch is motivated by a new notion of separability for clustering with linkage functions.
arXiv Detail & Related papers (2019-12-31T20:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.