Latent Transformations for Discrete-Data Normalising Flows
- URL: http://arxiv.org/abs/2006.06346v1
- Date: Thu, 11 Jun 2020 11:41:28 GMT
- Title: Latent Transformations for Discrete-Data Normalising Flows
- Authors: Rob Hesselink and Wilker Aziz
- Abstract summary: We present an unbiased alternative where rather than deterministically parameterising one transformation, we predict a distribution over latent transformations.
With transformations, the marginal likelihood of the data is differentiable and gradient-based learning is possible via score function estimation.
We observe great challenges with both deterministic proxy gradients and unbiased score function estimation.
- Score: 15.005894753472894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Normalising flows (NFs) for discrete data are challenging because
parameterising bijective transformations of discrete variables requires
predicting discrete/integer parameters. Having a neural network architecture
predict discrete parameters takes a non-differentiable activation function (eg,
the step function) which precludes gradient-based learning. To circumvent this
non-differentiability, previous work has employed biased proxy gradients, such
as the straight-through estimator. We present an unbiased alternative where
rather than deterministically parameterising one transformation, we predict a
distribution over latent transformations. With stochastic transformations, the
marginal likelihood of the data is differentiable and gradient-based learning
is possible via score function estimation. To test the viability of
discrete-data NFs we investigate performance on binary MNIST. We observe great
challenges with both deterministic proxy gradients and unbiased score function
estimation. Whereas the former often fails to learn even a shallow
transformation, the variance of the latter could not be sufficiently controlled
to admit deeper NFs.
Related papers
- Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.
We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z) - Variational Sampling of Temporal Trajectories [39.22854981703244]
We introduce a mechanism to learn the distribution of trajectories by parameterizing the transition function $f$ explicitly as an element in a function space.
Our framework allows efficient synthesis of novel trajectories, while also directly providing a convenient tool for inference.
arXiv Detail & Related papers (2024-03-18T02:12:12Z) - Posterior Collapse and Latent Variable Non-identifiability [54.842098835445]
We propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility.
Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
arXiv Detail & Related papers (2023-01-02T06:16:56Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Robust Correction of Sampling Bias Using Cumulative Distribution
Functions [19.551668880584973]
Varying domains and biased datasets can lead to differences between the training and the target distributions.
Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions.
arXiv Detail & Related papers (2020-10-23T22:13:00Z) - Probabilistic Numeric Convolutional Neural Networks [80.42120128330411]
Continuous input signals like images and time series that are irregularly sampled or have missing values are challenging for existing deep learning methods.
We propose Probabilistic Convolutional Neural Networks which represent features as Gaussian processes (GPs)
We then define a convolutional layer as the evolution of a PDE defined on this GP, followed by a nonlinearity.
In experiments we show that our approach yields a $3times$ reduction of error from the previous state of the art on the SuperPixel-MNIST dataset and competitive performance on the medical time2012 dataset PhysioNet.
arXiv Detail & Related papers (2020-10-21T10:08:21Z) - Reliable Categorical Variational Inference with Mixture of Discrete
Normalizing Flows [10.406659081400354]
Variational approximations are increasingly based on gradient-based optimization of expectations estimated by sampling.
Continuous relaxations, such as the Gumbel-Softmax for categorical distribution, enable gradient-based optimization, but do not define a valid probability mass for discrete observations.
In practice, selecting the amount of relaxation is difficult and one needs to optimize an objective that does not align with the desired one.
arXiv Detail & Related papers (2020-06-28T10:39:39Z) - Generalized Gumbel-Softmax Gradient Estimator for Various Discrete
Random Variables [16.643346012854156]
Esting the gradients of nodes is one of the crucial research questions in the deep generative modeling community.
This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation.
arXiv Detail & Related papers (2020-03-04T01:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.