Kernel learning approaches for summarising and combining posterior
similarity matrices
- URL: http://arxiv.org/abs/2009.12852v1
- Date: Sun, 27 Sep 2020 14:16:14 GMT
- Title: Kernel learning approaches for summarising and combining posterior
similarity matrices
- Authors: Alessandra Cabassi, Sylvia Richardson, Paul D. W. Kirk
- Abstract summary: We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When using Markov chain Monte Carlo (MCMC) algorithms to perform inference
for Bayesian clustering models, such as mixture models, the output is typically
a sample of clusterings (partitions) drawn from the posterior distribution. In
practice, a key challenge is how to summarise this output. Here we build upon
the notion of the posterior similarity matrix (PSM) in order to suggest new
approaches for summarising the output of MCMC algorithms for Bayesian
clustering models. A key contribution of our work is the observation that PSMs
are positive semi-definite, and hence can be used to define
probabilistically-motivated kernel matrices that capture the clustering
structure present in the data. This observation enables us to employ a range of
kernel methods to obtain summary clusterings, and otherwise exploit the
information summarised by PSMs. For example, if we have multiple PSMs, each
corresponding to a different dataset on a common set of statistical units, we
may use standard methods for combining kernels in order to perform integrative
clustering. We may moreover embed PSMs within predictive kernel models in order
to perform outcome-guided data integration. We demonstrate the performances of
the proposed methods through a range of simulation studies as well as two real
data applications. R code is available at
https://github.com/acabassi/combine-psms.
Related papers
- Mixture of multilayer stochastic block models for multiview clustering [0.0]
We propose an original method for aggregating multiple clustering coming from different sources of information.
The identifiability of the model parameters is established and a variational Bayesian EM algorithm is proposed for the estimation of these parameters.
The method is utilized to analyze global food trading networks, leading to structures of interest.
arXiv Detail & Related papers (2024-01-09T17:15:47Z) - Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for
Clustering Count Data [0.8499685241219366]
A class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced.
The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies.
arXiv Detail & Related papers (2023-11-13T21:23:15Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous
Data [16.153709556346417]
Clustering is a widely deployed learning tool.
iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z) - clusterBMA: Bayesian model averaging for clustering [1.2021605201770345]
We introduce clusterBMA, a method that enables weighted model averaging across results from unsupervised clustering algorithms.
We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model.
In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters.
arXiv Detail & Related papers (2022-09-09T04:55:20Z) - K-ARMA Models for Clustering Time Series Data [4.345882429229813]
We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm.
We show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria.
We perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.
arXiv Detail & Related papers (2022-06-30T18:16:11Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.