Faster Deterministic Approximation Algorithms for Correlation Clustering
  and Cluster Deletion
        - URL: http://arxiv.org/abs/2111.10699v1
- Date: Sat, 20 Nov 2021 22:47:19 GMT
- Title: Faster Deterministic Approximation Algorithms for Correlation Clustering
  and Cluster Deletion
- Authors: Nate Veldt
- Abstract summary: Correlation clustering is a framework for partitioning datasets based on pairwise similarity and dissimilarity scores.
In this paper we prove new relationships between correlation clustering problems and edge labeling problems related to the principle of strong partitioningic closure.
We develop new approximation algorithms for correlation clustering that have deterministic constant factor approximation guarantees and avoid the canonical linear programming relaxation.
- Score: 5.584060970507507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Correlation clustering is a framework for partitioning datasets based on
pairwise similarity and dissimilarity scores, and has been used for diverse
applications in bioinformatics, social network analysis, and computer vision.
Although many approximation algorithms have been designed for this problem, the
best theoretical results rely on obtaining lower bounds via expensive linear
programming relaxations. In this paper we prove new relationships between
correlation clustering problems and edge labeling problems related to the
principle of strong triadic closure. We use these connections to develop new
approximation algorithms for correlation clustering that have deterministic
constant factor approximation guarantees and avoid the canonical linear
programming relaxation. Our approach also extends to a variant of correlation
clustering called cluster deletion, that strictly prohibits placing negative
edges inside clusters. Our results include 4-approximation algorithms for
cluster deletion and correlation clustering, based on simplified linear
programs with far fewer constraints than the canonical relaxations. More
importantly, we develop faster techniques that are purely combinatorial, based
on computing maximal matchings in certain auxiliary graphs and hypergraphs.
This leads to a combinatorial 6-approximation for complete unweighted
correlation clustering, which is the best deterministic result for any method
that does not rely on linear programming. We also present the first
combinatorial constant factor approximation for cluster deletion.
 
      
        Related papers
        - Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
 Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
 arXiv  Detail & Related papers  (2025-08-07T15:29:22Z)
- A system identification approach to clustering vector autoregressive   time series [50.66782357329375]
 Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
 arXiv  Detail & Related papers  (2025-05-20T14:31:44Z)
- A Graph-Partitioning Based Continuous Optimization Approach to   Semi-supervised Clustering Problems [24.208152437317768]
 We view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset.
We propose a block coordinate descent algorithm to efficiently solve this model.
We can construct the clusters that theoretically meet the must-link constraints under mild assumptions.
 arXiv  Detail & Related papers  (2025-03-06T14:02:28Z)
- MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
  and Degree Descent Criterion [0.6906005491572401]
 spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
 arXiv  Detail & Related papers  (2023-12-07T06:19:39Z)
- Linear time Evidence Accumulation Clustering with KMeans [0.0]
 This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
 arXiv  Detail & Related papers  (2023-11-15T14:12:59Z)
- Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
 We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
 arXiv  Detail & Related papers  (2023-06-18T08:46:06Z)
- Near-Optimal Correlation Clustering with Privacy [37.94795032297396]
 Correlation clustering is a central problem in unsupervised learning.
In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees.
 arXiv  Detail & Related papers  (2022-03-02T22:30:19Z)
- Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
 Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
 arXiv  Detail & Related papers  (2022-03-02T20:07:04Z)
- Optimal Clustering with Bandit Feedback [57.672609011609886]
 This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
 arXiv  Detail & Related papers  (2022-02-09T06:05:05Z)
- Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
 Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
 arXiv  Detail & Related papers  (2021-12-07T18:50:17Z)
- Shift of Pairwise Similarities for Data Clustering [7.462336024223667]
 We consider the case where the regularization term is the sum of the squared size of the clusters, and then generalize it to adaptive regularization of the pairwise similarities.
This leads to shifting (adaptively) the pairwise similarities which might make some of them negative.
We then propose an efficient local search optimization algorithm with fast theoretical convergence rate to solve the new clustering problem.
 arXiv  Detail & Related papers  (2021-10-25T16:55:07Z)
- Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
 This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
 Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
 arXiv  Detail & Related papers  (2020-12-16T13:01:37Z)
- Local Graph Clustering with Network Lasso [90.66817876491052]
 We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
 arXiv  Detail & Related papers  (2020-04-25T17:52:05Z)
- Fair Correlation Clustering [92.15492066925977]
 We obtain approximation algorithms for correlation clustering under several important types of fairness constraints.
We show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
 arXiv  Detail & Related papers  (2020-02-06T14:28:21Z)
- Clustering Binary Data by Application of Combinatorial Optimization
  Heuristics [52.77024349608834]
 We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
 arXiv  Detail & Related papers  (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.