Clustering based on Mixtures of Sparse Gaussian Processes
- URL: http://arxiv.org/abs/2303.13665v1
- Date: Thu, 23 Mar 2023 20:44:36 GMT
- Title: Clustering based on Mixtures of Sparse Gaussian Processes
- Authors: Zahra Moslehi, Abdolreza Mirzaei, Mehran Safayani
- Abstract summary: How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning.
In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction.
Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
- Score: 6.939768185086753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating low dimensional representations of a high dimensional data set is an
important component in many machine learning applications. How to cluster data
using their low dimensional embedded space is still a challenging problem in
machine learning. In this article, we focus on proposing a joint formulation
for both clustering and dimensionality reduction. When a probabilistic model is
desired, one possible solution is to use the mixture models in which both
cluster indicator and low dimensional space are learned. Our algorithm is based
on a mixture of sparse Gaussian processes, which is called Sparse Gaussian
Process Mixture Clustering (SGP-MIC). The main advantages to our approach over
existing methods are that the probabilistic nature of this model provides more
advantages over existing deterministic methods, it is straightforward to
construct non-linear generalizations of the model, and applying a sparse model
and an efficient variational EM approximation help to speed up the algorithm.
Related papers
- Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.
We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z) - A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data [0.0]
We introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model.
The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.
arXiv Detail & Related papers (2023-02-06T23:34:51Z) - An Instance Selection Algorithm for Big Data in High imbalanced datasets
based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented.
This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets.
Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z) - A Non-Parametric Bootstrap for Spectral Clustering [0.7673339435080445]
We develop two novel algorithms that incorporate the spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme.
Our techniques are more consistent in their convergence when compared to other bootstrapped algorithms that fit finite mixture models.
arXiv Detail & Related papers (2022-09-13T08:37:05Z) - Time Series Clustering with an EM algorithm for Mixtures of Linear
Gaussian State Space Models [0.0]
We propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models.
The proposed method uses a new expectation-maximization algorithm for the mixture model to estimate the model parameters.
Experiments on a simulated dataset demonstrate the effectiveness of the method in clustering, parameter estimation, and model selection.
arXiv Detail & Related papers (2022-08-25T07:41:23Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Estimation of sparse Gaussian graphical models with hidden clustering
structure [8.258451067861932]
We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure.
We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers.
Numerical experiments on both synthetic data and real data demonstrate the good performance of our model.
arXiv Detail & Related papers (2020-04-17T08:43:31Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.