Related papers: Correlation Clustering Reconstruction in Semi-Adversarial Models

Correlation Clustering Reconstruction in Semi-Adversarial Models

URL: http://arxiv.org/abs/2108.04729v1
Date: Tue, 10 Aug 2021 14:46:17 GMT
Title: Correlation Clustering Reconstruction in Semi-Adversarial Models
Authors: Flavio Chierichetti, Alessandro Panconesi, Giuseppe Re, Luca Trevisan
Abstract summary: Correlation Clustering is an important clustering problem with many applications. We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
Score: 70.11015369368272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Correlation Clustering is an important clustering problem with many applications. We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering that has been corrupted by random noise and adversarial modifications. Concerning the latter, we study a standard "post-adversarial" model, in which adversarial modifications come after the noise, and also introduce and analyze a "pre-adversarial" model in which adversarial modifications come before the noise. Given an input coming from such a semi-adversarial generative model, the goal is to reconstruct almost perfectly and with high probability the latent clustering. We focus on the case where the hidden clusters have equal size and show the following. In the pre-adversarial setting, spectral algorithms are optimal, in the sense that they reconstruct all the way to the information-theoretic threshold beyond which no reconstruction is possible. In contrast, in the post-adversarial setting their ability to restore the hidden clusters stops before the threshold, but the gap is optimally filled by SDP-based algorithms.

Related papers

A system identification approach to clustering vector autoregressive time series [50.66782357329375]
Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
arXiv Detail & Related papers (2025-05-20T14:31:44Z)
Scalable Robust Bayesian Co-Clustering with Compositional ELBOs [2.6756996523251964]
Co-clustering exploits the duality of instances and features to uncover meaningful groups in both dimensions. We present the first fully variational Co-clustering framework that directly learns row and column clusters in the latent space. Our method not only preserves the advantages of prior Co-clustering approaches but also exceeds them in accuracy and robustness.
arXiv Detail & Related papers (2025-04-05T06:48:05Z)
Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time. With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering. Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z)
Deep Clustering using Dirichlet Process Gaussian Mixture and Alpha Jensen-Shannon Divergence Clustering Loss [0.65268245109828]
In the autoencoder based deep clustering, the challenge is how to jointly optimize both clustering and dimension reduction together. We introduce an infinite cluster representation using Dirichlet process Gaussian mixture model for joint clustering and model selection in the latent space. We evaluate our proposed deep model selection method with traditional model selection on large class number datasets such as MIT67 and CIFAR100.
arXiv Detail & Related papers (2024-12-12T05:02:41Z)
Gap-Free Clustering: Sensitivity and Robustness of SDP [6.996002801232415]
We study graph clustering in the Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters. Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(sqrtn)$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes.
arXiv Detail & Related papers (2023-08-29T21:27:21Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Common Failure Modes of Subcluster-based Sampling in Dirichlet Process Gaussian Mixture Models -- and a Deep-learning Solution [5.822529963339041]
Dirichlet Process Gaussian Mixture Model (DPGMM) is often used to cluster data when the number of clusters is unknown. One main DPGMM inference paradigm relies on sampling. Here we consider a known state-of-art sampler, analyze its failure modes, and show how to improve it.
arXiv Detail & Related papers (2022-03-25T14:12:33Z)
Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework. Speaker embeddings are optimized during training such that it better fits iGMM clustering. Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z)
KnAC: an approach for enhancing cluster analysis with background knowledge and explanations [0.20999222360659603]
We present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering. KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic.
arXiv Detail & Related papers (2021-12-16T10:13:47Z)
Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z)
Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models. We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z)
Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information. We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z)
A Robust Speaker Clustering Method Based on Discrete Tied Variational Autoencoder [27.211505187332385]
Traditional speaker clustering method based on aggregation hierarchy cluster (AHC) has the shortcomings of long-time running and remains sensitive to environment noise. We propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE) to enhance the robustness against noise.
arXiv Detail & Related papers (2020-03-04T08:54:38Z)
Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables. We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.