Related papers: Cluster-and-Conquer: When Randomness Meets Graph Locality

Cluster-and-Conquer: When Randomness Meets Graph Locality

URL: http://arxiv.org/abs/2010.11497v1
Date: Thu, 22 Oct 2020 07:31:12 GMT
Title: Cluster-and-Conquer: When Randomness Meets Graph Locality
Authors: George Giakkoupis (WIDE), Anne-Marie Kermarrec (EPFL), Olivier Ruas (SPIRALS), Fran\c{c}ois Ta\"iani (WIDE, IRISA)
Abstract summary: Some of the most efficient KNN graph algorithms are incremental and local. Cluster-and-Conquer boosts the starting configuration of greedy algorithms.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning applications. Some of the most efficient KNN graph algorithms are incremental and local: they start from a random graph, which they incrementally improve by traversing neighbors-of-neighbors links. Paradoxically, this random start is also one of the key weaknesses of these algorithms: nodes are initially connected to dissimilar neighbors, that lie far away according to the similarity metric. As a result, incremental algorithms must first laboriously explore spurious potential neighbors before they can identify similar nodes, and start converging. In this paper, we remove this drawback with Cluster-and-Conquer (C 2 for short). Cluster-and-Conquer boosts the starting configuration of greedy algorithms thanks to a novel lightweight clustering mechanism, dubbed FastRandomHash. FastRandomHash leverages random-ness and recursion to pre-cluster similar nodes at a very low cost. Our extensive evaluation on real datasets shows that Cluster-and-Conquer significantly outperforms existing approaches, including LSH, yielding speed-ups of up to x4.42 while incurring only a negligible loss in terms of KNN quality.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
Accelerating k-Means Clustering with Cover Trees [0.30693357740321775]
We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well. We obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.
arXiv Detail & Related papers (2024-10-19T14:02:42Z)
DenMune: Density peak based clustering using mutual nearest neighbors [0.0]
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user.
arXiv Detail & Related papers (2023-09-23T16:18:00Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss. Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z)
Local Sample-weighted Multiple Kernel Clustering with Consensus Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. This paper proposes a novel local sample-weighted multiple kernel clustering model. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z)
A density peaks clustering algorithm with sparse search and K-d tree [16.141611031128427]
Density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms.
arXiv Detail & Related papers (2022-03-02T09:29:40Z)
Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities. Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z)
Towards Efficient Graph Convolutional Networks for Point Cloud Handling [181.59146413326056]
We aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds. A series of experiments show that optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed.
arXiv Detail & Related papers (2021-04-12T17:59:16Z)
Interpretable Clustering on Dynamic Graphs with Recurrent Graph Neural Networks [24.017988997693262]
We study the problem of clustering nodes in a dynamic graph, where the connections between nodes and nodes' cluster memberships may change over time. We first propose a simple decay-based clustering algorithm that clusters nodes based on weighted connections between them, where the weight decreases at a fixed rate over time. We characterize the optimal decay rate for each cluster and propose a clustering method that achieves almost exact recovery of the true clusters.
arXiv Detail & Related papers (2020-12-16T04:31:19Z)
Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type. The algorithm is capable of detecting outliers and clusters of relatively lower density values. We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.