Related papers: Correlation Clustering in Constant Many Parallel Rounds

Correlation Clustering in Constant Many Parallel Rounds

URL: http://arxiv.org/abs/2106.08448v1
Date: Tue, 15 Jun 2021 21:45:45 GMT
Title: Correlation Clustering in Constant Many Parallel Rounds
Authors: Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrovi\'c, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski
Abstract summary: Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. Our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds.
Score: 42.10280805559555
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better [18.121514220195607]
Cluster deletion is an NP-hard graph clustering objective with applications in computational and social network analysis. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. We show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a maximum degree in an auxiliary graph and forming a cluster around it.
arXiv Detail & Related papers (2024-04-24T18:39:18Z)
A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering [1.544681800932596]
We introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels) A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We give an approximation algorithm for a NP-hard special case of the problem.
arXiv Detail & Related papers (2023-09-14T07:53:52Z)
Efficient distributed representations with linear-time attention scores normalization [3.8673630752805437]
We propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms. The accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude. The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem.
arXiv Detail & Related papers (2023-03-30T15:48:26Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections. For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z)
Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures [49.73889315176884]
Conditional gradient methods (CGM) are widely used in modern machine learning. Most efforts focus on reducing the number of iterations as a means to reduce the overall running time. We show the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms.
arXiv Detail & Related papers (2021-11-30T05:40:14Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.