Clustering Three-Way Data with Outliers
- URL: http://arxiv.org/abs/2310.05288v2
- Date: Wed, 11 Oct 2023 22:50:14 GMT
- Title: Clustering Three-Way Data with Outliers
- Authors: Katharine M. Clark and Paul D. McNicholas
- Abstract summary: An approach for clustering matrix-variate normal data with outliers is discussed.
The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm and uses an iterative approach to detect and trim outliers.
- Score: 1.2328446298523066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Matrix-variate distributions are a recent addition to the model-based
clustering field, thereby making it possible to analyze data in matrix form
with complex structure such as images and time series. Due to its recent
appearance, there is limited literature on matrix-variate data, with even less
on dealing with outliers in these models. An approach for clustering
matrix-variate normal data with outliers is discussed. The approach, which uses
the distribution of subset log-likelihoods, extends the OCLUST algorithm to
matrix-variate normal data and uses an iterative approach to detect and trim
outliers.
Related papers
- Regularized Projection Matrix Approximation with Applications to Community Detection [1.5033631151609534]
This paper introduces a regularized projection matrix approximation framework aimed at recovering cluster information from the affinity matrix.
We explore three distinct penalty functions addressing bounded, positive, and sparse scenarios, respectively, and derive the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the problem.
arXiv Detail & Related papers (2024-05-26T15:18:22Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Comparative Study of Inference Methods for Interpolative Decomposition [4.913248451323163]
We propose a probabilistic model with automatic relevance determination (ARD) for learning interpolative decomposition (ID)
We evaluate the model on a variety of real-world datasets including CCLE $EC50$, CCLE $IC50$, Gene Body Methylation, and Promoter Methylation datasets with different sizes, and dimensions.
arXiv Detail & Related papers (2022-06-29T11:37:05Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Clustering of Nonnegative Data and an Application to Matrix Completion [0.0]
We analyze its performance in relation to a certain measure of correlation between said subspaces.
We use our clustering algorithm to develop a matrix completion algorithm which can outperform standard matrix completion algorithms on data matrices satisfying certain natural conditions.
arXiv Detail & Related papers (2020-09-02T18:24:47Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Robust Matrix Completion with Mixed Data Types [0.0]
We consider the problem of recovering a structured low rank matrix with partially observed entries with mixed data types.
Most approaches assume that there is only one underlying distribution and the low rank constraint is regularized by the matrix Schatten Norm.
We propose a computationally feasible statistical approach with strong recovery guarantees along with an algorithmic framework suited for parallelization to recover a low rank matrix with partially observed entries for mixed data types in one step.
arXiv Detail & Related papers (2020-05-25T21:35:10Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions [6.839746711757701]
Traditional approaches to model-based clustering often fail for high dimensional data.
A parametrization of the component scale matrices for the mixture of generalized hyperbolic distributions is proposed.
An analytically feasible expectation-maximization algorithm is developed.
arXiv Detail & Related papers (2019-03-12T17:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.