Dynamic Correlation Clustering in Sublinear Update Time
- URL: http://arxiv.org/abs/2406.09137v1
- Date: Thu, 13 Jun 2024 14:07:15 GMT
- Title: Dynamic Correlation Clustering in Sublinear Update Time
- Authors: Vincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori, Nikos Parotsidis,
- Abstract summary: We study the classic problem of correlation clustering in dynamic node streams.
In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge.
We present an algorithm that maintains an $O(1)$-approximation with $O$(polylog $n$) amortized update time.
- Score: 33.079608335361286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. The objective is to continuously find a partition which minimizes the sum of positive edges crossing clusters and negative edges within clusters. We present an algorithm that maintains an $O(1)$-approximation with $O$(polylog $n$) amortized update time. Prior to our work, Behnezhad, Charikar, Ma, and L. Tan achieved a $5$-approximation with $O(1)$ expected update time in edge streams which translates in node streams to an $O(D)$-update time where $D$ is the maximum possible degree. Finally we complement our theoretical analysis with experiments on real world data.
Related papers
- Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval [49.825549809652436]
$k$NN-MT constructs an external datastore to store domain-specific translation knowledge.
adaptive retrieval ($k$NN-MT-AR) dynamically estimates $lambda$ and skips $k$NN retrieval if $lambda$ is less than a fixed threshold.
We propose dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects.
arXiv Detail & Related papers (2024-06-10T07:36:55Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Minimizing Polarization in Noisy Leader-Follower Opinion Dynamics [17.413930396663833]
We consider a problem of polarization optimization for the leader-follower opinion dynamics in a noisy social network.
We adopt the popular leader-follower DeGroot model, where the opinion of every leader is identical and remains unchanged, while the opinion of every follower is subject to white noise.
We show that the objective function is monotone and supermodular. We then propose a simple greedy algorithm with an approximation factor $1-1/e$ that approximately solves the problem in $O((n-q)3)$ time.
arXiv Detail & Related papers (2023-08-14T08:52:24Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Improved Learning-augmented Algorithms for k-means and k-medians
Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space.
We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost.
Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Maximizing Influence of Leaders in Social Networks [15.05158252504978]
We consider the edge addition problem for the DeGroot model of opinion dynamics in a social network with $n$ nodes and $m$ edges.
We show that our algorithm is efficient and effective, which scales to large networks with more than a million nodes.
arXiv Detail & Related papers (2021-06-11T02:31:46Z) - Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.
Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z) - Fully Asynchronous Policy Evaluation in Distributed Reinforcement
Learning over Networks [14.636457985379746]
This paper proposes a emphfully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks.
Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors.
arXiv Detail & Related papers (2020-03-01T08:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.