Generic Unsupervised Optimization for a Latent Variable Model With
Exponential Family Observables
- URL: http://arxiv.org/abs/2003.02214v3
- Date: Fri, 15 Dec 2023 12:01:20 GMT
- Title: Generic Unsupervised Optimization for a Latent Variable Model With
Exponential Family Observables
- Authors: Hamid Mousavi, Jakob Drefs, Florian Hirschberger, J\"org L\"ucke
- Abstract summary: Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables.
For unsupervised learning, LVMs which assume specific non-Gaussian observables have been considered.
We show that a set of very concise parameter update equations can be derived which feature the same functional form for all exponential family distributions.
- Score: 2.321323878201932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent variable models (LVMs) represent observed variables by parameterized
functions of latent variables. Prominent examples of LVMs for unsupervised
learning are probabilistic PCA or probabilistic SC which both assume a weighted
linear summation of the latents to determine the mean of a Gaussian
distribution for the observables. In many cases, however, observables do not
follow a Gaussian distribution. For unsupervised learning, LVMs which assume
specific non-Gaussian observables have therefore been considered. Already for
specific choices of distributions, parameter optimization is challenging and
only a few previous contributions considered LVMs with more generally defined
observable distributions. Here, we consider LVMs that are defined for a range
of different distributions, i.e., observables can follow any (regular)
distribution of the exponential family. The novel class of LVMs presented is
defined for binary latents, and it uses maximization in place of summation to
link the latents to observables. To derive an optimization procedure, we follow
an EM approach for maximum likelihood parameter estimation. We show that a set
of very concise parameter update equations can be derived which feature the
same functional form for all exponential family distributions. The derived
generic optimization can consequently be applied to different types of metric
data as well as to different types of discrete data. Also, the derived
optimization equations can be combined with a recently suggested variational
acceleration which is likewise generically applicable to the LVMs considered
here. So, the combination maintains generic and direct applicability of the
derived optimization procedure, but, crucially, enables efficient scalability.
We numerically verify our analytical results and discuss some potential
applications such as learning of variance structure, noise type estimation and
denoising.
Related papers
- Learning Survival Distributions with the Asymmetric Laplace Distribution [16.401141867387324]
We propose a parametric survival analysis method based on the Asymmetric Laplace Distribution (ALD)<n>This distribution allows for closed-form calculation of popular event summaries such as mean, median, mode, variation, and quantiles.<n>We show that the proposed method outperforms parametric and nonparametric approaches in terms of accuracy, discrimination and calibration.
arXiv Detail & Related papers (2025-05-06T17:34:41Z) - Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.
Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Disentanglement Analysis in Deep Latent Variable Models Matching Aggregate Posterior Distributions [0.5759862457142761]
We propose a method to evaluate disentanglement for deep latent variable models (DLVMs) in general.
The proposed technique discovers the latent vectors representing the generative factors of a dataset that can be different from the cardinal latent axes.
arXiv Detail & Related papers (2025-01-26T23:38:39Z) - EigenVI: score-based variational inference with orthogonal function expansions [23.696028065251497]
EigenVI is an eigenvalue-based approach for black-box variational inference (BBVI)
We use EigenVI to approximate a variety of target distributions, including a benchmark suite of Bayesian models from posteriordb.
arXiv Detail & Related papers (2024-10-31T15:48:34Z) - Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions.
We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Convex Parameter Estimation of Perturbed Multivariate Generalized
Gaussian Distributions [18.95928707619676]
We propose a convex formulation with well-established properties for MGGD parameters.
The proposed framework is flexible as it combines a variety of regularizations for the precision matrix, the mean and perturbations.
Experiments show a more accurate precision and covariance matrix estimation with similar performance for the mean vector parameter.
arXiv Detail & Related papers (2023-12-12T18:08:04Z) - Winning Prize Comes from Losing Tickets: Improve Invariant Learning by
Exploring Variant Parameters for Out-of-Distribution Generalization [76.27711056914168]
Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features.
Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task.
We propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift.
arXiv Detail & Related papers (2023-10-25T06:10:57Z) - Bayesian Non-linear Latent Variable Modeling via Random Fourier Features [7.856578780790166]
We present a method to perform Markov chain Monte Carlo inference for generalized nonlinear latent variable modeling.
Inference forVMs is computationally tractable only when the data likelihood is Gaussian.
We show that we can generalizeVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions.
arXiv Detail & Related papers (2023-06-14T08:42:10Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Generalised Gaussian Process Latent Variable Models (GPLVM) with
Stochastic Variational Inference [9.468270453795409]
We study the doubly formulation of the BayesianVM model amenable with minibatch training.
We show how this framework is compatible with different latent variable formulations and perform experiments to compare a suite of models.
We demonstrate how we can train in the presence of massively missing data and obtain high-fidelity reconstructions.
arXiv Detail & Related papers (2022-02-25T21:21:51Z) - Learning Invariant Representations using Inverse Contrastive Loss [34.93395633215398]
We introduce a class of losses for learning representations that are invariant to some extraneous variable of interest.
We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence.
arXiv Detail & Related papers (2021-02-16T18:29:28Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Stochastic Normalizing Flows [52.92110730286403]
We introduce normalizing flows for maximum likelihood estimation and variational inference (VI) using differential equations (SDEs)
Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs.
These SDEs can be used for constructing efficient chains to sample from the underlying distribution of a given dataset.
arXiv Detail & Related papers (2020-02-21T20:47:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.