Related papers: Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model

Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model

URL: http://arxiv.org/abs/2402.01050v1
Date: Thu, 1 Feb 2024 22:43:55 GMT
Title: Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model
Authors: Reda Khoufache, Anisse Belhadj, Hanene Azzag, Mustapha Lebbah
Abstract summary: We introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM) Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results.
Score: 0.24578723416255754
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM), employing the Master/Worker architecture. Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. The workload on rows is evenly distributed among workers, who exclusively communicate with the master and not among themselves. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results. Moreover, we present a real-use case applying our approach to co-cluster gene expression data. The code source is publicly available at https://github.com/redakhoufache/Distributed-NPLBM.

Related papers

Deep Generative Clustering with VAEs and Expectation-Maximization [1.8416014644193066]
We propose a novel deep clustering method that integrates Variational Autoencoders (VAEs) into the Expectation-Maximization framework. Our approach models the probability distribution of each cluster with a VAE and alternates between updating model parameters. This enables effective clustering and generation of new samples from each cluster.
arXiv Detail & Related papers (2025-01-13T14:26:39Z)
Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing. We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z)
Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models in Federated Learning [0.22499166814992444]
This paper proposes a new distributed Markov Chain Monte Carlo (MCMC) inference method for DPMMs (DisCGS) using sufficient statistics. Our approach uses the collapsed Gibbs sampler and is specifically designed to work on distributed data across independent and heterogeneous machines. For instance, with a dataset of 100K data points, the centralized algorithm requires approximately 12 hours to complete 100 iterations while our approach achieves the same number of iterations in just 3 minutes.
arXiv Detail & Related papers (2023-12-18T13:16:18Z)
Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for Clustering Count Data [0.8499685241219366]
A class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced. The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies.
arXiv Detail & Related papers (2023-11-13T21:23:15Z)
Learning Distributions via Monte-Carlo Marginalization [9.131712404284876]
We propose a novel method to learn intractable distributions from their samples. The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue. The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable.
arXiv Detail & Related papers (2023-08-11T19:08:06Z)
Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions. Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z)
Wrapped Distributions on homogeneous Riemannian manifolds [58.720142291102135]
Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions. We empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model.
arXiv Detail & Related papers (2022-04-20T21:25:21Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
DG-LMC: A Turn-key and Scalable Synchronous Distributed MCMC Algorithm [21.128416842467132]
We derive a user-friendly centralised distributed MCMC algorithm with provable scaling in high-dimensional settings. We illustrate the relevance of the proposed methodology on both synthetic and real data experiments.
arXiv Detail & Related papers (2021-06-11T10:37:14Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
Generative Semantic Hashing Enhanced via Boltzmann Machines [61.688380278649056]
Existing generative-hashing methods mostly assume a factorized form for the posterior distribution. We propose to employ the distribution of Boltzmann machine as the retrievalal posterior. We show that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
arXiv Detail & Related papers (2020-06-16T01:23:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.