Faster Deterministic Approximation Algorithms for Correlation Clustering
and Cluster Deletion
- URL: http://arxiv.org/abs/2111.10699v1
- Date: Sat, 20 Nov 2021 22:47:19 GMT
- Title: Faster Deterministic Approximation Algorithms for Correlation Clustering
and Cluster Deletion
- Authors: Nate Veldt
- Abstract summary: Correlation clustering is a framework for partitioning datasets based on pairwise similarity and dissimilarity scores.
In this paper we prove new relationships between correlation clustering problems and edge labeling problems related to the principle of strong partitioningic closure.
We develop new approximation algorithms for correlation clustering that have deterministic constant factor approximation guarantees and avoid the canonical linear programming relaxation.
- Score: 5.584060970507507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Correlation clustering is a framework for partitioning datasets based on
pairwise similarity and dissimilarity scores, and has been used for diverse
applications in bioinformatics, social network analysis, and computer vision.
Although many approximation algorithms have been designed for this problem, the
best theoretical results rely on obtaining lower bounds via expensive linear
programming relaxations. In this paper we prove new relationships between
correlation clustering problems and edge labeling problems related to the
principle of strong triadic closure. We use these connections to develop new
approximation algorithms for correlation clustering that have deterministic
constant factor approximation guarantees and avoid the canonical linear
programming relaxation. Our approach also extends to a variant of correlation
clustering called cluster deletion, that strictly prohibits placing negative
edges inside clusters. Our results include 4-approximation algorithms for
cluster deletion and correlation clustering, based on simplified linear
programs with far fewer constraints than the canonical relaxations. More
importantly, we develop faster techniques that are purely combinatorial, based
on computing maximal matchings in certain auxiliary graphs and hypergraphs.
This leads to a combinatorial 6-approximation for complete unweighted
correlation clustering, which is the best deterministic result for any method
that does not rely on linear programming. We also present the first
combinatorial constant factor approximation for cluster deletion.
Related papers
- MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Near-Optimal Correlation Clustering with Privacy [37.94795032297396]
Correlation clustering is a central problem in unsupervised learning.
In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees.
arXiv Detail & Related papers (2022-03-02T22:30:19Z) - Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Shift of Pairwise Similarities for Data Clustering [7.462336024223667]
We consider the case where the regularization term is the sum of the squared size of the clusters, and then generalize it to adaptive regularization of the pairwise similarities.
This leads to shifting (adaptively) the pairwise similarities which might make some of them negative.
We then propose an efficient local search optimization algorithm with fast theoretical convergence rate to solve the new clustering problem.
arXiv Detail & Related papers (2021-10-25T16:55:07Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z) - Fair Correlation Clustering [92.15492066925977]
We obtain approximation algorithms for correlation clustering under several important types of fairness constraints.
We show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
arXiv Detail & Related papers (2020-02-06T14:28:21Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.