Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension
reduction & clustering
- URL: http://arxiv.org/abs/2008.07110v2
- Date: Mon, 7 Sep 2020 03:23:24 GMT
- Title: Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension
reduction & clustering
- Authors: Debolina Paul, Saptarshi Chakraborty, Didong Li and David Dunson
- Abstract summary: This article focuses on improving upon PCA and k-means, by allowing nonlinear relations in the data and more flexible cluster shapes.
The key contribution is a new framework for Principal Analysis (PEA), defining a simple and computationally efficient alternative to PCA.
In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
- Score: 9.042239247913642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even with the rise in popularity of over-parameterized models, simple
dimensionality reduction and clustering methods, such as PCA and k-means, are
still routinely used in an amazing variety of settings. A primary reason is the
combination of simplicity, interpretability and computational efficiency. The
focus of this article is on improving upon PCA and k-means, by allowing
non-linear relations in the data and more flexible cluster shapes, without
sacrificing the key advantages. The key contribution is a new framework for
Principal Elliptical Analysis (PEA), defining a simple and computationally
efficient alternative to PCA that fits the best elliptical approximation
through the data. We provide theoretical guarantees on the proposed PEA
algorithm using Vapnik-Chervonenkis (VC) theory to show strong consistency and
uniform concentration bounds. Toy experiments illustrate the performance of
PEA, and the ability to adapt to non-linear structure and complex cluster
shapes. In a rich variety of real data clustering applications, PEA is shown to
do as well as k-means for simple datasets, while dramatically improving
performance in more complex settings.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA)
PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z) - Accelerated sparse Kernel Spectral Clustering for large scale data
clustering problems [0.27257174044950283]
An improved version of the sparse multiway kernel spectral clustering (KSC) is presented in this brief.
The original algorithm is derived from weighted kernel principal component analysis formulated within the primal-dual least-squares support vector machine (LS-SVM) framework.
Sparsity is achieved then by the combination of the incomplete Cholesky decomposition (ICD) based low rank approximation of the kernel matrix with the so called reduced set method.
arXiv Detail & Related papers (2023-10-20T09:51:42Z) - An online algorithm for contrastive Principal Component Analysis [9.090031210111919]
We derive an online algorithm for cPCA* and show that it maps onto a neural network with local learning rules, so it can potentially be implemented in energy efficient neuromorphic hardware.
We evaluate the performance of our online algorithm on real datasets and highlight the differences and similarities with the original formulation.
arXiv Detail & Related papers (2022-11-14T19:48:48Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Self-paced Principal Component Analysis [17.333976289539457]
We propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers.
The complexity of each sample is calculated at the beginning of each iteration in order to integrate samples from simple to more complex into training.
arXiv Detail & Related papers (2021-06-25T20:50:45Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.