Bayesian Low-Rank Interpolative Decomposition for Complex Datasets
- URL: http://arxiv.org/abs/2205.14825v1
- Date: Mon, 30 May 2022 03:06:48 GMT
- Title: Bayesian Low-Rank Interpolative Decomposition for Complex Datasets
- Authors: Jun Lu
- Abstract summary: We introduce a probabilistic model for learning interpolative decomposition (ID), which is commonly used for feature selection, low-rank approximation, and identifying hidden patterns in data.
We evaluate the model on a variety of real-world datasets including CCLE EC50, CCLE IC50, CTRP EC50,and MovieLens 100K datasets with different sizes, and dimensions.
- Score: 4.913248451323163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a probabilistic model for learning interpolative
decomposition (ID), which is commonly used for feature selection, low-rank
approximation, and identifying hidden patterns in data, where the matrix
factors are latent variables associated with each data dimension. Prior
densities with support on the specified subspace are used to address the
constraint for the magnitude of the factored component of the observed matrix.
Bayesian inference procedure based on Gibbs sampling is employed. We evaluate
the model on a variety of real-world datasets including CCLE EC50, CCLE IC50,
CTRP EC50,and MovieLens 100K datasets with different sizes, and dimensions, and
show that the proposed Bayesian ID GBT and GBTN models lead to smaller
reconstructive errors compared to existing randomized approaches.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies [4.913248451323163]
We propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance.
We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks.
arXiv Detail & Related papers (2022-09-29T03:36:56Z) - Robust Bayesian Nonnegative Matrix Factorization with Implicit
Regularizers [4.913248451323163]
We introduce a probabilistic model with implicit norm regularization for learning nonnegative matrix factorization (NMF)
We evaluate the model on several real-world datasets including Genomics of Drug Sensitivity in Cancer.
arXiv Detail & Related papers (2022-08-22T04:34:17Z) - Comparative Study of Inference Methods for Interpolative Decomposition [4.913248451323163]
We propose a probabilistic model with automatic relevance determination (ARD) for learning interpolative decomposition (ID)
We evaluate the model on a variety of real-world datasets including CCLE $EC50$, CCLE $IC50$, Gene Body Methylation, and Promoter Methylation datasets with different sizes, and dimensions.
arXiv Detail & Related papers (2022-06-29T11:37:05Z) - Flexible and Hierarchical Prior for Bayesian Nonnegative Matrix
Factorization [4.913248451323163]
We introduce a probabilistic model for learning nonnegative matrix factorization (NMF)
We evaluate the model on several real-world datasets including MovieLens 100K and MovieLens 1M with different sizes and dimensions.
arXiv Detail & Related papers (2022-05-23T03:51:55Z) - Hierarchical Infinite Relational Model [3.731168012111833]
The hierarchical infinite relational model (HIRM) is a new probabilistic generative model for noisy, sparse, and heterogeneous relational data.
We present new algorithms for fully Bayesian posterior inference via Gibbs sampling.
arXiv Detail & Related papers (2021-08-16T16:32:13Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.