Related papers: Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables

Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables

URL: http://arxiv.org/abs/2003.02214v3
Date: Fri, 15 Dec 2023 12:01:20 GMT
Title: Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables
Authors: Hamid Mousavi, Jakob Drefs, Florian Hirschberger, J\"org L\"ucke
Abstract summary: Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables. For unsupervised learning, LVMs which assume specific non-Gaussian observables have been considered. We show that a set of very concise parameter update equations can be derived which feature the same functional form for all exponential family distributions.
Score: 2.321323878201932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables. Prominent examples of LVMs for unsupervised learning are probabilistic PCA or probabilistic SC which both assume a weighted linear summation of the latents to determine the mean of a Gaussian distribution for the observables. In many cases, however, observables do not follow a Gaussian distribution. For unsupervised learning, LVMs which assume specific non-Gaussian observables have therefore been considered. Already for specific choices of distributions, parameter optimization is challenging and only a few previous contributions considered LVMs with more generally defined observable distributions. Here, we consider LVMs that are defined for a range of different distributions, i.e., observables can follow any (regular) distribution of the exponential family. The novel class of LVMs presented is defined for binary latents, and it uses maximization in place of summation to link the latents to observables. To derive an optimization procedure, we follow an EM approach for maximum likelihood parameter estimation. We show that a set of very concise parameter update equations can be derived which feature the same functional form for all exponential family distributions. The derived generic optimization can consequently be applied to different types of metric data as well as to different types of discrete data. Also, the derived optimization equations can be combined with a recently suggested variational acceleration which is likewise generically applicable to the LVMs considered here. So, the combination maintains generic and direct applicability of the derived optimization procedure, but, crucially, enables efficient scalability. We numerically verify our analytical results and discuss some potential applications such as learning of variance structure, noise type estimation and denoising.

Related papers

Learning Survival Distributions with the Asymmetric Laplace Distribution [16.401141867387324]
We propose a parametric survival analysis method based on the Asymmetric Laplace Distribution (ALD)<n>This distribution allows for closed-form calculation of popular event summaries such as mean, median, mode, variation, and quantiles.<n>We show that the proposed method outperforms parametric and nonparametric approaches in terms of accuracy, discrimination and calibration.
arXiv Detail & Related papers (2025-05-06T17:34:41Z)
Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
Disentanglement Analysis in Deep Latent Variable Models Matching Aggregate Posterior Distributions [0.5759862457142761]
We propose a method to evaluate disentanglement for deep latent variable models (DLVMs) in general. The proposed technique discovers the latent vectors representing the generative factors of a dataset that can be different from the cardinal latent axes.
arXiv Detail & Related papers (2025-01-26T23:38:39Z)
EigenVI: score-based variational inference with orthogonal function expansions [23.696028065251497]
EigenVI is an eigenvalue-based approach for black-box variational inference (BBVI) We use EigenVI to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb.
arXiv Detail & Related papers (2024-10-31T15:48:34Z)
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Convex Parameter Estimation of Perturbed Multivariate Generalized Gaussian Distributions [18.95928707619676]
We propose a convex formulation with well-established properties for MGGD parameters. The proposed framework is flexible as it combines a variety of regularizations for the precision matrix, the mean and perturbations. Experiments show a more accurate precision and covariance matrix estimation with similar performance for the mean vector parameter.
arXiv Detail & Related papers (2023-12-12T18:08:04Z)
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization [76.27711056914168]
Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features. Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task. We propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift.
arXiv Detail & Related papers (2023-10-25T06:10:57Z)
Bayesian Non-linear Latent Variable Modeling via Random Fourier Features [7.856578780790166]
We present a method to perform Markov chain Monte Carlo inference for generalized nonlinear latent variable modeling. Inference forVMs is computationally tractable only when the data likelihood is Gaussian. We show that we can generalizeVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions.
arXiv Detail & Related papers (2023-06-14T08:42:10Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference [9.468270453795409]
We study the doubly formulation of the BayesianVM model amenable with minibatch training. We show how this framework is compatible with different latent variable formulations and perform experiments to compare a suite of models. We demonstrate how we can train in the presence of massively missing data and obtain high-fidelity reconstructions.
arXiv Detail & Related papers (2022-02-25T21:21:51Z)
Learning Invariant Representations using Inverse Contrastive Loss [34.93395633215398]
We introduce a class of losses for learning representations that are invariant to some extraneous variable of interest. We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence.
arXiv Detail & Related papers (2021-02-16T18:29:28Z)
Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets. We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z)
Stochastic Normalizing Flows [52.92110730286403]
We introduce normalizing flows for maximum likelihood estimation and variational inference (VI) using differential equations (SDEs) Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs. These SDEs can be used for constructing efficient chains to sample from the underlying distribution of a given dataset.
arXiv Detail & Related papers (2020-02-21T20:47:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.