Scalable Community Detection via Parallel Correlation Clustering
- URL: http://arxiv.org/abs/2108.01731v1
- Date: Tue, 27 Jul 2021 04:33:37 GMT
- Title: Scalable Community Detection via Parallel Correlation Clustering
- Authors: Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub {\L}\k{a}cki,
Vahab Mirrokni
- Abstract summary: Graph clustering and community detection are central problems in modern data mining.
In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth.
- Score: 1.5644420658691407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph clustering and community detection are central problems in modern data
mining. The increasing need for analyzing billion-scale data calls for faster
and more scalable algorithms for these problems. There are certain trade-offs
between the quality and speed of such clustering algorithms. In this paper, we
design scalable algorithms that achieve high quality when evaluated based on
ground truth. We develop a generalized sequential and shared-memory parallel
framework based on the LambdaCC objective (introduced by Veldt et al.), which
encompasses modularity and correlation clustering. Our framework consists of
highly-optimized implementations that scale to large data sets of billions of
edges and that obtain high-quality clusters compared to ground-truth data, on
both unweighted and weighted graphs. Our empirical evaluation shows that this
framework improves the state-of-the-art trade-offs between speed and quality of
scalable community detection. For example, on a 30-core machine with two-way
hyper-threading, our implementations achieve orders of magnitude speedups over
other correlation clustering baselines, and up to 28.44x speedups over our own
sequential baselines while maintaining or improving quality.
Related papers
- Evolvable Agents, a Fine Grained Approach for Distributed Evolutionary
Computing: Walking towards the Peer-to-Peer Computing Frontiers [0.0]
We propose a fine grained approach with self-adaptive migration rate for distributed evolutionary computation.
We analyse the approach viability by comparing how solution quality and algorithm speed change when the number of processors increases.
With this experimental setup, our approach shows better scalability than the Island model and a equivalent robustness on the average of the three test functions under study.
arXiv Detail & Related papers (2024-01-30T18:11:31Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Efficient Forecasting of Large Scale Hierarchical Time Series via
Multilevel Clustering [26.236569277576425]
We propose a novel approach to the problem of clustering hierarchically aggregated time-series data.
We first group time series at each aggregated level, while simultaneously leveraging local and global information.
arXiv Detail & Related papers (2022-05-27T17:13:05Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering
using Nearest-Neighbor Chain [6.824747267214373]
We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (HAC) algorithms.
Compared to most previous parallel HAC algorithms, our new algorithms require only linear memory, and are scalable to large data sets.
Our algorithms are able to scale to data set sizes with tens of millions of points, which existing algorithms are not able to handle.
arXiv Detail & Related papers (2021-06-08T23:13:27Z) - Interpretable Clustering on Dynamic Graphs with Recurrent Graph Neural
Networks [24.017988997693262]
We study the problem of clustering nodes in a dynamic graph, where the connections between nodes and nodes' cluster memberships may change over time.
We first propose a simple decay-based clustering algorithm that clusters nodes based on weighted connections between them, where the weight decreases at a fixed rate over time.
We characterize the optimal decay rate for each cluster and propose a clustering method that achieves almost exact recovery of the true clusters.
arXiv Detail & Related papers (2020-12-16T04:31:19Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z) - Clustering with Fast, Automated and Reproducible assessment applied to
longitudinal neural tracking [3.817161834189992]
C-FAR is a novel method for Fast, Automated and Reproducible assessment of hierarchical clustering algorithms simultaneously.
Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees.
Our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons.
arXiv Detail & Related papers (2020-03-19T01:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.