Related papers: Scalable Community Detection via Parallel Correlation Clustering

Scalable Community Detection via Parallel Correlation Clustering

URL: http://arxiv.org/abs/2108.01731v1
Date: Tue, 27 Jul 2021 04:33:37 GMT
Title: Scalable Community Detection via Parallel Correlation Clustering
Authors: Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub {\L}\k{a}cki, Vahab Mirrokni
Abstract summary: Graph clustering and community detection are central problems in modern data mining. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth.
Score: 1.5644420658691407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality.

Related papers

TNStream: Applying Tightest Neighbors to Micro-Clusters to Define Multi-Density Clusters in Streaming Data [1.2016321065590192]
This paper proposes a clustering algorithm based on the novel concept of Tightest Neighbors and introduces a data stream clustering theory based on the Skeleton Set. Based on these theories, this paper develops a new method, TNStream, a fully online algorithm. Experimental results demonstrate its effectiveness in improving clustering quality for multi-density data and validate the proposed data stream clustering theory.
arXiv Detail & Related papers (2025-05-01T07:15:20Z)
Evolvable Agents, a Fine Grained Approach for Distributed Evolutionary Computing: Walking towards the Peer-to-Peer Computing Frontiers [0.0]
We propose a fine grained approach with self-adaptive migration rate for distributed evolutionary computation. We analyse the approach viability by comparing how solution quality and algorithm speed change when the number of processors increases. With this experimental setup, our approach shows better scalability than the Island model and a equivalent robustness on the average of the three test functions under study.
arXiv Detail & Related papers (2024-01-30T18:11:31Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Efficient Forecasting of Large Scale Hierarchical Time Series via Multilevel Clustering [26.236569277576425]
We propose a novel approach to the problem of clustering hierarchically aggregated time-series data. We first group time series at each aggregated level, while simultaneously leveraging local and global information.
arXiv Detail & Related papers (2022-05-27T17:13:05Z)
Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data. Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step. When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)
ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain [6.824747267214373]
We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (HAC) algorithms. Compared to most previous parallel HAC algorithms, our new algorithms require only linear memory, and are scalable to large data sets. Our algorithms are able to scale to data set sizes with tens of millions of points, which existing algorithms are not able to handle.
arXiv Detail & Related papers (2021-06-08T23:13:27Z)
Interpretable Clustering on Dynamic Graphs with Recurrent Graph Neural Networks [24.017988997693262]
We study the problem of clustering nodes in a dynamic graph, where the connections between nodes and nodes' cluster memberships may change over time. We first propose a simple decay-based clustering algorithm that clusters nodes based on weighted connections between them, where the weight decreases at a fixed rate over time. We characterize the optimal decay rate for each cluster and propose a clustering method that achieves almost exact recovery of the true clusters.
arXiv Detail & Related papers (2020-12-16T04:31:19Z)
(k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories. We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW) We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking [3.817161834189992]
C-FAR is a novel method for Fast, Automated and Reproducible assessment of hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. Our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons.
arXiv Detail & Related papers (2020-03-19T01:33:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.