Correlation Clustering Reconstruction in Semi-Adversarial Models
- URL: http://arxiv.org/abs/2108.04729v1
- Date: Tue, 10 Aug 2021 14:46:17 GMT
- Title: Correlation Clustering Reconstruction in Semi-Adversarial Models
- Authors: Flavio Chierichetti, Alessandro Panconesi, Giuseppe Re, Luca Trevisan
- Abstract summary: Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
- Score: 70.11015369368272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Correlation Clustering is an important clustering problem with many
applications. We study the reconstruction version of this problem in which one
is seeking to reconstruct a latent clustering that has been corrupted by random
noise and adversarial modifications.
Concerning the latter, we study a standard "post-adversarial" model, in which
adversarial modifications come after the noise, and also introduce and analyze
a "pre-adversarial" model in which adversarial modifications come before the
noise. Given an input coming from such a semi-adversarial generative model, the
goal is to reconstruct almost perfectly and with high probability the latent
clustering.
We focus on the case where the hidden clusters have equal size and show the
following. In the pre-adversarial setting, spectral algorithms are optimal, in
the sense that they reconstruct all the way to the information-theoretic
threshold beyond which no reconstruction is possible. In contrast, in the
post-adversarial setting their ability to restore the hidden clusters stops
before the threshold, but the gap is optimally filled by SDP-based algorithms.
Related papers
- Gap-Free Clustering: Sensitivity and Robustness of SDP [6.996002801232415]
We study graph clustering in the Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters.
Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(sqrtn)$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster.
We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes.
arXiv Detail & Related papers (2023-08-29T21:27:21Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Common Failure Modes of Subcluster-based Sampling in Dirichlet Process
Gaussian Mixture Models -- and a Deep-learning Solution [5.822529963339041]
Dirichlet Process Gaussian Mixture Model (DPGMM) is often used to cluster data when the number of clusters is unknown.
One main DPGMM inference paradigm relies on sampling.
Here we consider a known state-of-art sampler, analyze its failure modes, and show how to improve it.
arXiv Detail & Related papers (2022-03-25T14:12:33Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - KnAC: an approach for enhancing cluster analysis with background
knowledge and explanations [0.20999222360659603]
We present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering.
KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic.
arXiv Detail & Related papers (2021-12-16T10:13:47Z) - Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies.
We present a number of applications where the use of multiple geometric models improves accuracy.
These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z) - A Robust Speaker Clustering Method Based on Discrete Tied Variational
Autoencoder [27.211505187332385]
Traditional speaker clustering method based on aggregation hierarchy cluster (AHC) has the shortcomings of long-time running and remains sensitive to environment noise.
We propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE) to enhance the robustness against noise.
arXiv Detail & Related papers (2020-03-04T08:54:38Z) - Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.