Related papers: Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences

Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences

URL: http://arxiv.org/abs/2004.14764v2
Date: Fri, 24 Jul 2020 09:04:05 GMT
Title: Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences
Authors: Ignacio Tamarit, Mar\'ia Pereda, and Jos\'e A. Cuesta
Abstract summary: We provide an hierarchical clustering algorithm based on a dissimilarity between entities that quantifies the probability that the features shared by two entities is due to mere chance. The algorithm performance is $O(n2)$ when applied to a set of n entities, and its outcome is a dendrogram exhibiting the connections of those entities.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When some 'entities' are related by the 'features' they share they are amenable to a bipartite network representation. Plant-pollinator ecological communities, co-authorship of scientific papers, customers and purchases, or answers in a poll, are but a few examples. Analyzing clustering of such entities in the network is a useful tool with applications in many fields, like internet technology, recommender systems, or detection of diseases. The algorithms most widely applied to find clusters in bipartite networks are variants of modularity optimization. Here we provide an hierarchical clustering algorithm based on a dissimilarity between entities that quantifies the probability that the features shared by two entities is due to mere chance. The algorithm performance is $O(n^2)$ when applied to a set of n entities, and its outcome is a dendrogram exhibiting the connections of those entities. Through the introduction of a 'susceptibility' measure we can provide an 'optimal' choice for the clustering as well as quantify its quality. The dendrogram reveals further useful structural information though -- like the existence of sub-clusters within clusters or of nodes that do not fit in any cluster. We illustrate the algorithm by applying it first to a set of synthetic networks, and then to a selection of examples. We also illustrate how to transform our algorithm into a valid alternative for one-mode networks as well, and show that it performs at least as well as the standard, modularity-based algorithms -- with a higher numerical performance. We provide an implementation of the algorithm in Python freely accessible from GitHub.

Related papers

FLASC: A Flare-Sensitive Clustering Algorithm [0.0]
We present FLASC, an algorithm that detects branches within clusters to identify subpopulations. Two variants of the algorithm are presented, which trade computational cost for noise robustness. We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs.
arXiv Detail & Related papers (2023-11-27T14:55:16Z)
Exact Recovery and Bregman Hard Clustering of Node-Attributed Stochastic Block Model [0.16385815610837165]
This paper presents an information-theoretic criterion for the exact recovery of community labels. It shows how network and attribute information can be exchanged in order to have exact recovery. It also presents an iterative clustering algorithm that maximizes the joint likelihood.
arXiv Detail & Related papers (2023-10-30T16:46:05Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss. Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z)
Community detection in complex networks via node similarity, graph representation learning, and hierarchical clustering [4.264842058017711]
Community detection is a critical challenge in analysing real graphs. This article proposes three new, general, hierarchical frameworks to deal with this task. We compare over a hundred module combinations on the Block Model graphs and real-life datasets.
arXiv Detail & Related papers (2023-03-21T22:12:53Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
KCoreMotif: An Efficient Graph Clustering Algorithm for Large Networks by Exploiting k-core Decomposition and Motifs [6.1734015475373765]
Spectral clustering is one of the most commonly used algorithms for graph-structured data (networks) We propose an efficient graph clustering algorithm, KCoreMotif, specifically for large networks by exploiting k-core decomposition and motifs. Comparative results demonstrate that the proposed graph clustering algorithm is accurate yet efficient for large networks.
arXiv Detail & Related papers (2020-08-21T12:21:05Z)
Towards Deeper Graph Neural Networks with Differentiable Group Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z)
Graph Neural Networks with Composite Kernels [60.81504431653264]
We re-interpret node aggregation from the perspective of kernel weighting. We present a framework to consider feature similarity in an aggregation scheme. We propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.
arXiv Detail & Related papers (2020-05-16T04:44:29Z)
Data Structures & Algorithms for Exact Inference in Hierarchical Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure. Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.