Statistical Inference for Fuzzy Clustering
- URL: http://arxiv.org/abs/2601.02656v1
- Date: Tue, 06 Jan 2026 02:11:01 GMT
- Title: Statistical Inference for Fuzzy Clustering
- Authors: Qiuyi Wu, Zihan Zhu, Anru R. Zhang,
- Abstract summary: Fuzzy $c$-means (FCM) allow mixed memberships and better capture uncertainty and gradual transitions.<n>We develop a new framework for weighted fuzzy $c$-means (WFCM) for settings with potential cluster size imbalance.
- Score: 7.416766339318596
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clustering is a central tool in biomedical research for discovering heterogeneous patient subpopulations, where group boundaries are often diffuse rather than sharply separated. Traditional methods produce hard partitions, whereas soft clustering methods such as fuzzy $c$-means (FCM) allow mixed memberships and better capture uncertainty and gradual transitions. Despite the widespread use of FCM, principled statistical inference for fuzzy clustering remains limited. We develop a new framework for weighted fuzzy $c$-means (WFCM) for settings with potential cluster size imbalance. Cluster-specific weights rebalance the classical FCM criterion so that smaller clusters are not overwhelmed by dominant groups, and the weighted objective induces a normalized density model with scale parameter $σ$ and fuzziness parameter $m$. Estimation is performed via a blockwise majorize--minimize (MM) procedure that alternates closed-form membership and centroid updates with likelihood-based updates of $(σ,\bw)$. The intractable normalizing constant is approximated by importance sampling using a data-adaptive Gaussian mixture proposal. We further provide likelihood ratio tests for comparing cluster centers and bootstrap-based confidence intervals. We establish consistency and asymptotic normality of the maximum likelihood estimator, validate the method through simulations, and illustrate it using single-cell RNA-seq and Alzheimer disease Neuroimaging Initiative (ADNI) data. These applications demonstrate stable uncertainty quantification and biologically meaningful soft memberships, ranging from well-separated cell populations under imbalance to a graded AD versus non-AD continuum consistent with disease progression.
Related papers
- SafeBench-Seq: A Homology-Clustered, CPU-Only Baseline for Protein Hazard Screening with Physicochemical/Composition Features and Cluster-Aware Confidence Intervals [26.81598226089532]
We introduce SafeBench-Seq, a metadata-only, reproducible benchmark and baseline classifier built entirely from public data.<n>To approximate "never-before-seen" threats, we homology-cluster the combined dataset at =40% identity and perform cluster-level holdouts.<n>We report discrimination (AUROC/AUPRC) and screening-operating points (TPR@1% FPR; FPR@95% TPR) with 95% bootstrap confidence intervals (n=200)<n>We quantify probability quality using Brier score, Expected Error (ECE); 15 bins, and reliability diagrams.
arXiv Detail & Related papers (2025-12-19T12:51:31Z) - ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction [4.43233586255351]
We introduce Adaptive and Dynamic Neuro-Fuzzy Clustering, a novel streaming-capable framework that combines Convolutional Neural Network-based feature extraction with an online fuzzy clustering engine.<n>On the C-NMC leukemia microscopy dataset, our tool achieves a silhouette score of 0.51, demonstrating superior cohesion and separation over static baselines.
arXiv Detail & Related papers (2025-06-23T08:30:17Z) - Interaction-Aware Gaussian Weighting for Clustered Federated Learning [58.92159838586751]
Federated Learning (FL) emerged as a decentralized paradigm to train models while preserving privacy.<n>We propose a novel clustered FL method, FedGWC (Federated Gaussian Weighting Clustering), which groups clients based on their data distribution.<n>Our experiments on benchmark datasets show that FedGWC outperforms existing FL algorithms in cluster quality and classification accuracy.
arXiv Detail & Related papers (2025-02-05T16:33:36Z) - A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data.
Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity.
We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.
We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z) - Dirichlet Process-based Robust Clustering using the Median-of-Means Estimator [16.774378814288806]
We propose an efficient and automatic clustering technique by integrating the strengths of model-based and centroid-based methodologies.<n>Our method mitigates the effect of noise on the quality of clustering; while at the same time, estimates the number of clusters.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - DFedADMM: Dual Constraints Controlled Model Inconsistency for
Decentralized Federated Learning [52.83811558753284]
Decentralized learning (DFL) discards the central server and establishes a decentralized communication network.
Existing DFL methods still suffer from two major challenges: local inconsistency and local overfitting.
arXiv Detail & Related papers (2023-08-16T11:22:36Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous
Data [16.153709556346417]
Clustering is a widely deployed learning tool.
iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z) - Why have a Unified Predictive Uncertainty? Disentangling it using Deep
Split Ensembles [39.29536042476913]
Understanding and quantifying uncertainty in black box Neural Networks (NNs) is critical when deployed in real-world settings such as healthcare.
We propose a conceptually simple non-Bayesian approach, deep split ensemble, to disentangle the predictive uncertainties.
arXiv Detail & Related papers (2020-09-25T19:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.