Quality check of a sample partition using multinomial distribution
- URL: http://arxiv.org/abs/2404.07778v1
- Date: Thu, 11 Apr 2024 14:14:58 GMT
- Title: Quality check of a sample partition using multinomial distribution
- Authors: Soumita Modak,
- Abstract summary: We advocate a novel measure for the purpose of checking the quality of a cluster partition for a sample into several distinct classes.
We apply the multinomial distribution to the distances of data members, clustered in a group, from their respective cluster representatives.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we advocate a novel measure for the purpose of checking the quality of a cluster partition for a sample into several distinct classes, and thus, determine the unknown value for the true number of clusters prevailing the provided set of data. Our objective leads us to the development of an approach through applying the multinomial distribution to the distances of data members, clustered in a group, from their respective cluster representatives. This procedure is carried out independently for each of the clusters, and the concerned statistics are combined together to design our targeted measure. Individual clusters separately possess the category-wise probabilities which correspond to different positions of its members in the cluster with respect to a typical member, in the form of cluster-centroid, medoid or mode, referred to as the corresponding cluster representative. Our method is robust in the sense that it is distribution-free, since this is devised irrespective of the parent distribution of the underlying sample. It fulfills one of the rare coveted qualities, present in the existing cluster accuracy measures, of having the capability to investigate whether the assigned sample owns any inherent clusters other than a single group of all members or not. Our measure's simple concept, easy algorithm, fast runtime, good performance, and wide usefulness, demonstrated through extensive simulation and diverse case-studies, make it appealing.
Related papers
- A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Revisiting Silhouette Aggregation [2.56711111236449]
Silhouette coefficient is an evaluation measure that produces a score per data point, assessing the quality of its clustering assignment.
An alternative path, that is rarely employed, is to average first at the cluster level and then (macro) average across clusters.
We show that the typical micro-averaging strategy is sensitive to cluster imbalance while the overlooked macro-averaging strategy is far more robust.
arXiv Detail & Related papers (2024-01-11T10:57:29Z) - Federated Two Stage Decoupling With Adaptive Personalization Layers [5.69361786082969]
Federated learning has gained significant attention due to its ability to enable distributed learning while maintaining privacy constraints.
It inherently experiences significant learning degradation and slow convergence speed.
It is natural to employ the concept of clustering homogeneous clients into the same group, allowing only the model weights within each group to be aggregated.
arXiv Detail & Related papers (2023-08-30T07:46:32Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Implicit Sample Extension for Unsupervised Person Re-Identification [97.46045935897608]
Clustering sometimes mixes different true identities together or splits the same identity into two or more sub clusters.
We propose an Implicit Sample Extension (OurWholeMethod) method to generate what we call support samples around the cluster boundaries.
Experiments demonstrate that the proposed method is effective and achieves state-of-the-art performance for unsupervised person Re-ID.
arXiv Detail & Related papers (2022-04-14T11:41:48Z) - Self-Evolutionary Clustering [1.662966122370634]
Most existing deep clustering methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping.
A novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner.
The framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised.
arXiv Detail & Related papers (2022-02-21T19:38:18Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Protecting Individual Interests across Clusters: Spectral Clustering
with Guarantees [20.350342151402963]
We propose an individual fairness criterion for clustering a graph $mathcalG$ that requires each cluster to contain an adequate number of members connected to the individual.
We devise a spectral clustering algorithm to find fair clusters under a given representation graph.
arXiv Detail & Related papers (2021-05-08T15:03:25Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.