Real Elliptically Skewed Distributions and Their Application to Robust
Cluster Analysis
- URL: http://arxiv.org/abs/2006.16671v2
- Date: Mon, 21 Jun 2021 08:40:00 GMT
- Title: Real Elliptically Skewed Distributions and Their Application to Robust
Cluster Analysis
- Authors: Christian A. Schroth and Michael Muma
- Abstract summary: This article proposes a new class of Really Skewed (RESK) distributions and associated clustering algorithms.
Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications.
- Score: 5.137336092866906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article proposes a new class of Real Elliptically Skewed (RESK)
distributions and associated clustering algorithms that allow for integrating
robustness and skewness into a single unified cluster analysis framework.
Non-symmetrically distributed and heavy-tailed data clusters have been reported
in a variety of real-world applications. Robustness is essential because a few
outlying observations can severely obscure the cluster structure. The RESK
distributions are a generalization of the Real Elliptically Symmetric (RES)
distributions. To estimate the cluster parameters and memberships, we derive an
expectation maximization (EM) algorithm for arbitrary RESK distributions.
Special attention is given to a new robust skew-Huber M-estimator, which is
also the maximum likelihood estimator (MLE) for the skew-Huber distribution
that belongs to the RESK class. Numerical experiments on simulated and
real-world data confirm the usefulness of the proposed methods for skewed and
heavy-tailed data sets.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous
Data [16.153709556346417]
Clustering is a widely deployed learning tool.
iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Supervised Multivariate Learning with Simultaneous Feature Auto-grouping
and Dimension Reduction [7.093830786026851]
This paper proposes a novel clustered reduced-rank learning framework.
It imposes two joint matrix regularizations to automatically group the features in constructing predictive factors.
It is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection.
arXiv Detail & Related papers (2021-12-17T20:11:20Z) - Neural Mixture Models with Expectation-Maximization for End-to-end Deep
Clustering [0.8543753708890495]
In this paper, we realize mixture model-based clustering with a neural network.
We train the network end-to-end via batch-wise EM iterations where the forward pass acts as the E-step and the backward pass acts as the M-step.
Our trained networks outperform single-stage deep clustering methods that still depend on k-means.
arXiv Detail & Related papers (2021-07-06T08:00:58Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Robust M-Estimation Based Bayesian Cluster Enumeration for Real
Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications.
This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model.
We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.