The generalized underlap coefficient with an application in clustering
- URL: http://arxiv.org/abs/2602.19473v2
- Date: Wed, 25 Feb 2026 01:52:46 GMT
- Title: The generalized underlap coefficient with an application in clustering
- Authors: Zhaoxi Zhang, Vanda Inacio, Sara Wade,
- Abstract summary: Underlap coefficient (UNL) is a multi-group separation measure.<n>We establish key properties of the UNL and provide an explicit connection to total variation.<n>We illustrate the application of the UNL in clustering using two real world datasets.
- Score: 2.140305355990306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantifying distributional separation across groups is fundamental in statistical learning and scientific discovery, yet most classical discrepancy measures are tailored to two-group comparisons. We generalize the underlap coefficient (UNL), a multi-group separation measure, to multivariate variables. We establish key properties of the UNL and provide an explicit connection to total variation. We further interpret the UNL as a dependence measure between a group label and variables of interest and compare it with mutual information. We propose an efficient importance sampling estimator of the UNL that can be combined with flexible density estimators. The utility of the UNL for assessing partition-covariate dependence in clustering is highlighted in detail, where it is particularly useful for evaluating whether the latent group structure can be explained by specific covariates. Finally we illustrate the application of the UNL in clustering using two real world datasets.
Related papers
- Cross-Fitting-Free Debiased Machine Learning with Multiway Dependence [0.0]
This paper develops a theory for two-step debiased machine learning (DML) estimators in generalised method of moments (GMM) models with general multiway clustered dependence, without relying on cross-fitting.<n>We show that valid inference can be achieved without sample splitting by combining Neyman-orthogonal moment conditions with a localisation-based empirical approach, allowing for an arbitrary number of clustering dimensions.
arXiv Detail & Related papers (2026-02-11T20:09:23Z) - Clustering Approaches for Mixed-Type Data: A Comparative Study [0.0]
Clustering mixed-type data is a challenge, as few existing approaches are suited for this task.<n>This study presents the state-of-the-art of these approaches and compares them using various simulation models.<n>In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI)
arXiv Detail & Related papers (2025-11-24T22:18:23Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Self Supervised Correlation-based Permutations for Multi-View Clustering [7.093692674858257]
We propose an end-to-end deep learning-based multi-view clustering framework for general data types.<n>Our approach involves generating meaningful fused representations using a novel permutation-based canonical correlation objective.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Learning Group Importance using the Differentiable Hypergeometric
Distribution [16.30064635746202]
partitioning elements into subsets of unknown sizes is essential in many applications.
In this work, we propose the differentiable hypergeometric distribution.
We show that we can learn the size of subsets in two typical applications: weakly-supervised learning and clustering.
arXiv Detail & Related papers (2022-03-03T10:44:50Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Group Heterogeneity Assessment for Multilevel Models [68.95633278540274]
Many data sets contain an inherent multilevel structure.
Taking this structure into account is critical for the accuracy and calibration of any statistical analysis performed on such data.
We propose a flexible framework for efficiently assessing differences between the levels of given grouping variables in the data.
arXiv Detail & Related papers (2020-05-06T12:42:04Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z) - Statistical power for cluster analysis [0.0]
Cluster algorithms are increasingly popular in biomedical research.
We estimate power and accuracy for common analysis through simulation.
We recommend that researchers only apply cluster analysis when large subgroup separation is expected.
arXiv Detail & Related papers (2020-03-01T02:43:15Z) - Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.