Related papers: Clustering based on Mixtures of Sparse Gaussian Processes

Clustering based on Mixtures of Sparse Gaussian Processes

URL: http://arxiv.org/abs/2303.13665v1
Date: Thu, 23 Mar 2023 20:44:36 GMT
Title: Clustering based on Mixtures of Sparse Gaussian Processes
Authors: Zahra Moslehi, Abdolreza Mirzaei, Mehran Safayani
Abstract summary: How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC)
Score: 6.939768185086753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating low dimensional representations of a high dimensional data set is an important component in many machine learning applications. How to cluster data using their low dimensional embedded space is still a challenging problem in machine learning. In this article, we focus on proposing a joint formulation for both clustering and dimensionality reduction. When a probabilistic model is desired, one possible solution is to use the mixture models in which both cluster indicator and low dimensional space are learned. Our algorithm is based on a mixture of sparse Gaussian processes, which is called Sparse Gaussian Process Mixture Clustering (SGP-MIC). The main advantages to our approach over existing methods are that the probabilistic nature of this model provides more advantages over existing deterministic methods, it is straightforward to construct non-linear generalizations of the model, and applying a sparse model and an efficient variational EM approximation help to speed up the algorithm.

Related papers

Minimax-Optimal Dimension-Reduced Clustering for High-Dimensional Nonspherical Mixtures [1.4502611532302039]
Nonspherical (anisotropic) noise within each cluster is widely present in real-world data. We study both the minimax rate and optimal statistical procedure for clustering under high-dimensional nonspherical mixture models.
arXiv Detail & Related papers (2025-02-04T18:55:49Z)
Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods. We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z)
A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data [0.0]
We introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model. The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.
arXiv Detail & Related papers (2023-02-06T23:34:51Z)
An Instance Selection Algorithm for Big Data in High imbalanced datasets based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented. This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets. Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z)
A Non-Parametric Bootstrap for Spectral Clustering [0.7673339435080445]
We develop two novel algorithms that incorporate the spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Our techniques are more consistent in their convergence when compared to other bootstrapped algorithms that fit finite mixture models.
arXiv Detail & Related papers (2022-09-13T08:37:05Z)
Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models [0.0]
We propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models. The proposed method uses a new expectation-maximization algorithm for the mixture model to estimate the model parameters. Experiments on a simulated dataset demonstrate the effectiveness of the method in clustering, parameter estimation, and model selection.
arXiv Detail & Related papers (2022-08-25T07:41:23Z)
Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering [6.635611625764804]
We introduce hierarchical mixtures of Gaussians (HMoGs), which unify dimensionality reduction and clustering into a single model.<n>HMoGs provide closed-form expressions for the model likelihood, exact inference over latent states and cluster membership, and exact algorithms for maximum-likelihood optimization.<n>We demonstrate HMoGs on synthetic experiments and MNIST, and show how joint optimization of dimensionality reduction and clustering facilitates increased model performance.
arXiv Detail & Related papers (2022-06-10T02:03:18Z)
Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds. We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)
Estimation of sparse Gaussian graphical models with hidden clustering structure [8.258451067861932]
We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure. We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers. Numerical experiments on both synthetic data and real data demonstrate the good performance of our model.
arXiv Detail & Related papers (2020-04-17T08:43:31Z)
Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature. It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.