Unifying Information-Theoretic and Pair-Counting Clustering Similarity
- URL: http://arxiv.org/abs/2511.03000v1
- Date: Tue, 04 Nov 2025 21:13:32 GMT
- Title: Unifying Information-Theoretic and Pair-Counting Clustering Similarity
- Authors: Alexander J. Gates,
- Abstract summary: Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic.<n>Here, we develop an analytical framework that unifies these families through two complementary perspectives.
- Score: 51.660331450043806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to $k$-tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.
Related papers
- Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families [8.045700364123645]
We introduce a framework to evaluate representational similarity measures based on their ability to separate model families.<n>We use three complementary separability measures-dprime from signal detection theory, silhouette coefficients and ROC-AUC.<n>We show that separability systematically increases as metrics impose more stringent alignment constraints.
arXiv Detail & Related papers (2025-09-04T19:11:10Z) - Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning [65.75756724642932]
In incomplete multi-view clustering, missing data induce prototype shifts within views and semantic inconsistencies across views.<n>We propose an IMVC framework, imputation- and alignment-free for consensus semantics learning (FreeCSL)<n>FreeCSL achieves more confident and robust assignments on IMVC task, compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-05-16T12:37:10Z) - Similarity and Dissimilarity Guided Co-association Matrix Construction for Ensemble Clustering [22.280221709474105]
We propose the Similarity and Dissimilarity Guided Co-association matrix (SDGCA) to achieve ensemble clustering.
First, we introduce normalized ensemble entropy to estimate the quality of each cluster, and construct a similarity matrix based on this estimation.
We employ the random walk to explore high-order proximity of base clusterings to construct a dissimilarity matrix.
arXiv Detail & Related papers (2024-11-01T08:10:28Z) - HeNCler: Node Clustering in Heterophilous Graphs via Learned Asymmetric Similarity [48.62389920549271]
HeNCler learns a similarity graph by optimizing a clustering-specific objective based on weighted kernel singular value decomposition.<n>Our approach enables spectral clustering on an asymmetric similarity graph, providing flexibility for both directed and undirected graphs.
arXiv Detail & Related papers (2024-05-27T11:04:05Z) - CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models [1.068128849363198]
We introduce a novel framework to assess the causal effect of concepts on specific classification outcomes.
We leverage state-of-the-art diffusion-based generative models to estimate counterfactual outcomes.
Our approach compares these causal attributions with closely related associational attributions, both theoretically and empirically.
arXiv Detail & Related papers (2024-05-24T18:33:18Z) - Towards Distribution-Agnostic Generalized Category Discovery [51.52673017664908]
Data imbalance and open-ended distribution are intrinsic characteristics of the real visual world.
We propose a Self-Balanced Co-Advice contrastive framework (BaCon)
BaCon consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task.
arXiv Detail & Related papers (2023-10-02T17:39:58Z) - Generalizable Heterogeneous Federated Cross-Correlation and Instance
Similarity Learning [60.058083574671834]
This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation.
For heterogeneous issue, we leverage irrelevant unlabeled public data for communication.
For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation.
arXiv Detail & Related papers (2023-09-28T09:32:27Z) - Advancing Relation Extraction through Language Probing with Exemplars
from Set Co-Expansion [1.450405446885067]
Relation Extraction (RE) is a pivotal task in automatically extracting structured information from unstructured text.
We present a multi-faceted approach that integrates representative examples and through co-set expansion.
Our method achieves an observed margin of at least 1 percent improvement in accuracy in most settings.
arXiv Detail & Related papers (2023-08-18T00:56:35Z) - Systematic compactification of the two-channel Kondo model. III. Extended field-theoretic renormalization group analysis [44.99833362998488]
We calculate the detailed flow for the (multi) two-channel Kondo model and its compactified versions.
We gain insights into the contradistinction between the consistent vs. conventional bosonization-debosonization formalisms.
In particular, we make use of renormalization-flow arguments to further justify the consistent refermionization of the parallel Kondo interaction.
arXiv Detail & Related papers (2023-08-07T14:07:21Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - Comparing Cross Correlation-Based Similarities [1.0152838128195467]
Multiset-based correlations based on the real-valued multiset Jaccard and coincidence indices are compared.
Results have immediate implications not only in pattern recognition and deep learning, but also in scientific modeling in general.
arXiv Detail & Related papers (2021-11-08T08:50:13Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Combining Task Predictors via Enhancing Joint Predictability [53.46348489300652]
We present a new predictor combination algorithm that improves the target by i) measuring the relevance of references based on their capabilities in predicting the target, and ii) strengthening such estimated relevance.
Our algorithm jointly assesses the relevance of all references by adopting a Bayesian framework.
Based on experiments on seven real-world datasets from visual attribute ranking and multi-class classification scenarios, we demonstrate that our algorithm offers a significant performance gain and broadens the application range of existing predictor combination approaches.
arXiv Detail & Related papers (2020-07-15T21:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.