Breaking 3-Factor Approximation for Correlation Clustering in
Polylogarithmic Rounds
- URL: http://arxiv.org/abs/2307.06723v1
- Date: Thu, 13 Jul 2023 12:32:49 GMT
- Title: Breaking 3-Factor Approximation for Correlation Clustering in
Polylogarithmic Rounds
- Authors: Nairen Cao, Shang-En Huang, Hsin-Hao Su
- Abstract summary: We study parallel algorithms for the correlation clustering problem.
The goal is to partition the entities into clusters to minimize disagreements with labels.
Currently, all efficient parallel algorithms have an approximation ratio of at least 3.
We propose the first poly-logarithmic algorithm that achieves a better approximation ratio than 3.
- Score: 0.23633885460047763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study parallel algorithms for the correlation clustering
problem, where every pair of two different entities is labeled with similar or
dissimilar. The goal is to partition the entities into clusters to minimize the
number of disagreements with the labels. Currently, all efficient parallel
algorithms have an approximation ratio of at least 3. In comparison with the
$1.994+\epsilon$ ratio achieved by polynomial-time sequential algorithms
[CLN22], a significant gap exists.
We propose the first poly-logarithmic depth parallel algorithm that achieves
a better approximation ratio than 3. Specifically, our algorithm computes a
$(2.4+\epsilon)$-approximate solution and uses $\tilde{O}(m^{1.5})$ work.
Additionally, it can be translated into a $\tilde{O}(m^{1.5})$-time sequential
algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with
$\tilde{O}(m^{1.5})$ total memory.
Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12]
length-constrained multi-commodity flow algorithm, where we develop an
efficient parallel algorithm to solve a truncated correlation clustering linear
program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of
the truncated linear program can be rounded with a factor of at most 2.4 loss
by using the framework of [CMSY15]. Such a rounding framework can then be
implemented using parallel pivot-based approaches.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Approximation Algorithms for Preference Aggregation Using CP-Nets [3.337244446073836]
This paper studies the design and analysis of approximation algorithms for aggregating preferences over Conditional Preference Networks (CP-nets)
Its focus is on aggregating preferences over so-called emphswaps, for which optimal solutions in general are already known to be of exponential size.
arXiv Detail & Related papers (2023-12-14T17:31:38Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time [1.5644420658691407]
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs.
We define an algorithmic framework for hierarchical agglomerative graph clustering.
We show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
arXiv Detail & Related papers (2021-06-10T09:29:05Z) - ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering
using Nearest-Neighbor Chain [6.824747267214373]
We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (HAC) algorithms.
Compared to most previous parallel HAC algorithms, our new algorithms require only linear memory, and are scalable to large data sets.
Our algorithms are able to scale to data set sizes with tens of millions of points, which existing algorithms are not able to handle.
arXiv Detail & Related papers (2021-06-08T23:13:27Z) - Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and
Hierarchical Spatial Clustering [6.4805900740861]
We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$*$.
We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time)
Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
arXiv Detail & Related papers (2021-04-02T16:05:00Z) - Practical and Parallelizable Algorithms for Non-Monotone Submodular
Maximization with Size Constraint [20.104148319012854]
We present and parallelizable for a submodular function, not necessarily a monotone, with respect to a size constraint.
We improve the best approximation factor achieved by an algorithm that has optimal adaptivity and nearly optimal complexity query to $0.193 - varepsilon$.
arXiv Detail & Related papers (2020-09-03T22:43:55Z) - Revisiting Modified Greedy Algorithm for Monotone Submodular
Maximization with a Knapsack Constraint [75.85952446237599]
We show that a modified greedy algorithm can achieve an approximation factor of $0.305$.
We derive a data-dependent upper bound on the optimum.
It can also be used to significantly improve the efficiency of such algorithms as branch and bound.
arXiv Detail & Related papers (2020-08-12T15:40:21Z) - Second-order Conditional Gradient Sliding [79.66739383117232]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm.
The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations.
It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.