Related papers: Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time

Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time

URL: http://arxiv.org/abs/2411.09979v1
Date: Fri, 15 Nov 2024 06:26:37 GMT
Title: Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time
Authors: Vladimir Braverman, Prathamesh Dharangutte, Shreyas Pai, Vihan Shah, Chen Wang,
Abstract summary: We study the dynamic correlation clustering problem with $textitadaptive$ edge label flips. In correlation clustering, we are given a $n$-vertex complete graph whose edges are labeled either $(+)$ or $(-)$. We consider the dynamic setting with adversarial robustness, in which the $textitadaptive$ adversary could flip the label of an edge based on the current output of the algorithm.
Score: 19.25942907402098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the dynamic correlation clustering problem with $\textit{adaptive}$ edge label flips. In correlation clustering, we are given a $n$-vertex complete graph whose edges are labeled either $(+)$ or $(-)$, and the goal is to minimize the total number of $(+)$ edges between clusters and the number of $(-)$ edges within clusters. We consider the dynamic setting with adversarial robustness, in which the $\textit{adaptive}$ adversary could flip the label of an edge based on the current output of the algorithm. Our main result is a randomized algorithm that always maintains an $O(1)$-approximation to the optimal correlation clustering with $O(\log^{2}{n})$ amortized update time. Prior to our work, no algorithm with $O(1)$-approximation and $\text{polylog}{(n)}$ update time for the adversarially robust setting was known. We further validate our theoretical results with experiments on synthetic and real-world datasets with competitive empirical performances. Our main technical ingredient is an algorithm that maintains $\textit{sparse-dense decomposition}$ with $\text{polylog}{(n)}$ update time, which could be of independent interest.

Related papers

Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z)
Dynamic Correlation Clustering in Sublinear Update Time [33.079608335361286]
We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. We present an algorithm that maintains an $O(1)$-approximation with $O$(polylog $n$) amortized update time.
arXiv Detail & Related papers (2024-06-13T14:07:15Z)
Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream. Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error. For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z)
Simple, Scalable and Effective Clustering via One-Dimensional Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$. We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z)
Can We Find Nash Equilibria at a Linear Rate in Markov Games? [95.10091348976779]
We study decentralized learning in two-player zero-sum discounted Markov games. The goal is to design a policy optimization algorithm for either agent satisfying two properties.
arXiv Detail & Related papers (2023-03-03T02:40:26Z)
Correlation Clustering Algorithm for Dynamic Complete Signed Graphs: An Index-based Approach [9.13755431537592]
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(mtimesleft( 2+ alpha (G) right)+n)$ to $O(m+n)$ for any given value of $varepsilon$ for a complete signed graph. Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting.
arXiv Detail & Related papers (2023-01-01T10:57:36Z)
Improved Learning-augmented Algorithms for k-means and k-medians Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space. We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost. Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions [9.29659220237395]
We present a new approach for solving (minimum disagreement) correlation clustering. We obtain sublinear algorithms with highly efficient time and space complexity. The main ingredient of our approach is a novel connection to sparse-dense graph decompositions.
arXiv Detail & Related papers (2021-09-29T16:25:02Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.