On the failure of variational score matching for VAE models
- URL: http://arxiv.org/abs/2210.13390v1
- Date: Mon, 24 Oct 2022 16:43:04 GMT
- Title: On the failure of variational score matching for VAE models
- Authors: Li Kevin Wenliang
- Abstract summary: We present a critical study of existing variational SM objectives, showing catastrophic failure on a wide range of datasets and network architectures.
Our theoretical insights on the objectives emerge directly from their equivalent autoencoding losses when optimizing variational autoencoder (VAE) models.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Score matching (SM) is a convenient method for training flexible
probabilistic models, which is often preferred over the traditional
maximum-likelihood (ML) approach. However, these models are less interpretable
than normalized models; as such, training robustness is in general difficult to
assess. We present a critical study of existing variational SM objectives,
showing catastrophic failure on a wide range of datasets and network
architectures. Our theoretical insights on the objectives emerge directly from
their equivalent autoencoding losses when optimizing variational autoencoder
(VAE) models. First, we show that in the Fisher autoencoder, SM produces far
worse models than maximum-likelihood, and approximate inference by Fisher
divergence can lead to low-density local optima. However, with important
modifications, this objective reduces to a regularized autoencoding loss that
resembles the evidence lower bound (ELBO). This analysis predicts that the
modified SM algorithm should behave very similarly to ELBO on Gaussian VAEs. We
then review two other FD-based objectives from the literature and show that
they reduce to uninterpretable autoencoding losses, likely leading to poor
performance. The experiments verify our theoretical predictions and suggest
that only ELBO and the baseline objective robustly produce expected results,
while previously proposed SM methods do not.
Related papers
- Offline Model-Based Optimization by Learning to Rank [26.21886715050762]
We argue that regression models trained with mean squared error (MSE) are not well-aligned with the primary goal of offline model-based optimization.
We propose learning a ranking-based model that leverages learning to rank techniques to prioritize promising designs based on their relative scores.
arXiv Detail & Related papers (2024-10-15T11:15:03Z) - A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles [0.0]
The MoE framework combines an RNN for volatile stocks and a linear model for stable stocks, dynamically adjusting the weight of each model through a gating network.
Results indicate that the MoE approach significantly improves predictive accuracy across different volatility profiles.
The MoE model's adaptability allows it to outperform each individual model, reducing errors such as Mean Squared Error (MSE) and Mean Absolute Error (MAE)
arXiv Detail & Related papers (2024-10-04T14:36:21Z) - On the Impact of Sampling on Deep Sequential State Estimation [17.92198582435315]
State inference and parameter learning in sequential models can be successfully performed with approximation techniques.
Tighter Monte Carlo objectives have been proposed in the literature to enhance generative modeling performance.
arXiv Detail & Related papers (2023-11-28T17:59:49Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z) - VAE Approximation Error: ELBO and Conditional Independence [78.72292013299868]
This paper analyzes VAE approximation errors caused by the combination of the ELBO objective with the choice of the encoder probability family.
We show that the ELBO subset can not be enlarged, and the respective error cannot be decreased, by only considering deeper encoder networks.
arXiv Detail & Related papers (2021-02-18T12:54:42Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Shaping Deep Feature Space towards Gaussian Mixture for Visual
Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution.
The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.