Non-parametric Clustering of Multivariate Populations with Arbitrary
Sizes
- URL: http://arxiv.org/abs/2211.06338v1
- Date: Fri, 11 Nov 2022 16:38:29 GMT
- Title: Non-parametric Clustering of Multivariate Populations with Arbitrary
Sizes
- Authors: Yves Isma\"el Ngounou Bakam and Denys Pommeret
- Abstract summary: We propose a clustering procedure to group K populations into subgroups with the same dependence structure.
We illustrate our clustering algorithm via numerical studies and through two real datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a clustering procedure to group K populations into subgroups with
the same dependence structure. The method is adapted to paired population and
can be used with panel data. It relies on the differences between orthogonal
projection coefficients of the K density copulas estimated from the K
populations. Each cluster is then constituted by populations having
significantly similar dependence structures. A recent test statistic from
Ngounou-Bakam and Pommeret (2022) is used to construct automatically such
clusters. The procedure is data driven and depends on the asymptotic level of
the test. We illustrate our clustering algorithm via numerical studies and
through two real datasets: a panel of financial datasets and insurance dataset
of losses and allocated loss adjustment expense.
Related papers
- Clustering Approaches for Mixed-Type Data: A Comparative Study [0.0]
Clustering mixed-type data is a challenge, as few existing approaches are suited for this task.<n>This study presents the state-of-the-art of these approaches and compares them using various simulation models.<n>In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI)
arXiv Detail & Related papers (2025-11-24T22:18:23Z) - Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient [0.5939858158928473]
This paper proposes an algorithm named k- SCC to estimate the optimal k in categorical data clustering.
Comparative experiments were conducted on both synthetic and real datasets to compare the performance of k- SCC.
arXiv Detail & Related papers (2025-01-26T14:29:11Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Causal K-Means Clustering [5.087519744951637]
Causal k-Means Clustering harnesses the widely-used k-means clustering algorithm to uncover the unknown subgroup structure.
We present a plug-in estimator which is simple and readily implementable using off-the-shelf algorithms.
Our proposed methods are especially useful for modern outcome-wide studies with multiple treatment levels.
arXiv Detail & Related papers (2024-05-05T23:59:51Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Swarm Intelligence for Self-Organized Clustering [6.85316573653194]
A swarm system called Databionic swarm (DBS) is introduced which is able to adapt itself to structures of high-dimensional data.
By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering.
arXiv Detail & Related papers (2021-06-10T06:21:48Z) - A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets [0.76146285961466]
We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
arXiv Detail & Related papers (2021-01-01T07:20:50Z) - Deep Semi-Supervised Embedded Clustering (DSEC) for Stratification of
Heart Failure Patients [50.48904066814385]
In this work we apply deep semi-supervised embedded clustering to determine data-driven patient subgroups of heart failure.
We find clinically relevant clusters from an embedded space derived from heterogeneous data.
The proposed algorithm can potentially find new undiagnosed subgroups of patients that have different outcomes.
arXiv Detail & Related papers (2020-12-24T12:56:46Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z) - Statistical power for cluster analysis [0.0]
Cluster algorithms are increasingly popular in biomedical research.
We estimate power and accuracy for common analysis through simulation.
We recommend that researchers only apply cluster analysis when large subgroup separation is expected.
arXiv Detail & Related papers (2020-03-01T02:43:15Z) - Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.