Sublinear Time and Space Algorithms for Correlation Clustering via
Sparse-Dense Decompositions
- URL: http://arxiv.org/abs/2109.14528v1
- Date: Wed, 29 Sep 2021 16:25:02 GMT
- Title: Sublinear Time and Space Algorithms for Correlation Clustering via
Sparse-Dense Decompositions
- Authors: Sepehr Assadi, Chen Wang
- Abstract summary: We present a new approach for solving (minimum disagreement) correlation clustering.
We obtain sublinear algorithms with highly efficient time and space complexity.
The main ingredient of our approach is a novel connection to sparse-dense graph decompositions.
- Score: 9.29659220237395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new approach for solving (minimum disagreement) correlation
clustering that results in sublinear algorithms with highly efficient time and
space complexity for this problem. In particular, we obtain the following
algorithms for $n$-vertex $(+/-)$-labeled graphs $G$:
-- A sublinear-time algorithm that with high probability returns a constant
approximation clustering of $G$ in $O(n\log^2{n})$ time assuming access to the
adjacency list of the $(+)$-labeled edges of $G$ (this is almost quadratically
faster than even reading the input once). Previously, no sublinear-time
algorithm was known for this problem with any multiplicative approximation
guarantee.
-- A semi-streaming algorithm that with high probability returns a constant
approximation clustering of $G$ in $O(n\log{n})$ space and a single pass over
the edges of the graph $G$ (this memory is almost quadratically smaller than
input size). Previously, no single-pass algorithm with $o(n^2)$ space was known
for this problem with any approximation guarantee.
The main ingredient of our approach is a novel connection to sparse-dense
graph decompositions that are used extensively in the graph coloring
literature. To our knowledge, this connection is the first application of these
decompositions beyond graph coloring, and in particular for the correlation
clustering problem, and can be of independent interest.
Related papers
- Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means [4.271492285528115]
We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering.
In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
arXiv Detail & Related papers (2024-07-15T20:04:06Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Near-Optimal Quantum Coreset Construction Algorithms for Clustering [15.513270929560088]
We give quantum algorithms that find coresets for $k$-clustering in $mathbbRd$ with $tildeO(sqrtnkd3/2)$ query complexity.
Our coreset reduces the input size from $n$ to $mathrmpoly(kepsilon-1d)$, so that existing $alpha$-approximation algorithms for clustering can run on top of it.
arXiv Detail & Related papers (2023-06-05T12:22:46Z) - A polynomial-time iterative algorithm for random graph matching with
non-vanishing correlation [12.869436542291144]
We propose an efficient algorithm for matching two correlated ErdHos--R'enyi graphs with $n$ whose edges are correlated through a latent correspondence.
Our algorithm has running time and succeeds to recover the latent matching as long as the edge correlation is non-vanishing.
arXiv Detail & Related papers (2023-06-01T00:58:50Z) - Correlation Clustering Algorithm for Dynamic Complete Signed Graphs: An
Index-based Approach [9.13755431537592]
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(mtimesleft( 2+ alpha (G) right)+n)$ to $O(m+n)$ for any given value of $varepsilon$ for a complete signed graph.
Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting.
arXiv Detail & Related papers (2023-01-01T10:57:36Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time [1.5644420658691407]
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs.
We define an algorithmic framework for hierarchical agglomerative graph clustering.
We show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
arXiv Detail & Related papers (2021-06-10T09:29:05Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.