Correlation Clustering Reconstruction in Semi-Adversarial Models
- URL: http://arxiv.org/abs/2108.04729v1
- Date: Tue, 10 Aug 2021 14:46:17 GMT
- Title: Correlation Clustering Reconstruction in Semi-Adversarial Models
- Authors: Flavio Chierichetti, Alessandro Panconesi, Giuseppe Re, Luca Trevisan
- Abstract summary: Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
- Score: 70.11015369368272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Correlation Clustering is an important clustering problem with many
applications. We study the reconstruction version of this problem in which one
is seeking to reconstruct a latent clustering that has been corrupted by random
noise and adversarial modifications.
Concerning the latter, we study a standard "post-adversarial" model, in which
adversarial modifications come after the noise, and also introduce and analyze
a "pre-adversarial" model in which adversarial modifications come before the
noise. Given an input coming from such a semi-adversarial generative model, the
goal is to reconstruct almost perfectly and with high probability the latent
clustering.
We focus on the case where the hidden clusters have equal size and show the
following. In the pre-adversarial setting, spectral algorithms are optimal, in
the sense that they reconstruct all the way to the information-theoretic
threshold beyond which no reconstruction is possible. In contrast, in the
post-adversarial setting their ability to restore the hidden clusters stops
before the threshold, but the gap is optimally filled by SDP-based algorithms.
Related papers
- Deep Clustering using Dirichlet Process Gaussian Mixture and Alpha Jensen-Shannon Divergence Clustering Loss [0.65268245109828]
In the autoencoder based deep clustering, the challenge is how to jointly optimize both clustering and dimension reduction together.
We introduce an infinite cluster representation using Dirichlet process Gaussian mixture model for joint clustering and model selection in the latent space.
We evaluate our proposed deep model selection method with traditional model selection on large class number datasets such as MIT67 and CIFAR100.
arXiv Detail & Related papers (2024-12-12T05:02:41Z) - Gap-Free Clustering: Sensitivity and Robustness of SDP [6.996002801232415]
We study graph clustering in the Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters.
Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(sqrtn)$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster.
We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes.
arXiv Detail & Related papers (2023-08-29T21:27:21Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Common Failure Modes of Subcluster-based Sampling in Dirichlet Process
Gaussian Mixture Models -- and a Deep-learning Solution [5.822529963339041]
Dirichlet Process Gaussian Mixture Model (DPGMM) is often used to cluster data when the number of clusters is unknown.
One main DPGMM inference paradigm relies on sampling.
Here we consider a known state-of-art sampler, analyze its failure modes, and show how to improve it.
arXiv Detail & Related papers (2022-03-25T14:12:33Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - KnAC: an approach for enhancing cluster analysis with background
knowledge and explanations [0.20999222360659603]
We present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering.
KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic.
arXiv Detail & Related papers (2021-12-16T10:13:47Z) - Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies.
We present a number of applications where the use of multiple geometric models improves accuracy.
These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z) - A Robust Speaker Clustering Method Based on Discrete Tied Variational
Autoencoder [27.211505187332385]
Traditional speaker clustering method based on aggregation hierarchy cluster (AHC) has the shortcomings of long-time running and remains sensitive to environment noise.
We propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE) to enhance the robustness against noise.
arXiv Detail & Related papers (2020-03-04T08:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.