Related papers: AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

URL: http://arxiv.org/abs/2006.05164v1
Date: Tue, 9 Jun 2020 10:11:28 GMT
Title: AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
Authors: Jae Hyun Lim, Aaron Courville, Christopher Pal, Chin-Wei Huang
Abstract summary: We propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function. We demonstrate state-of-the-art performance on density estimation with variational autoencoders and continuous control with soft actor-critic.
Score: 8.24001702529974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary continuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduce the error of the gradient approximator by approaching asymptotic optimality of a regular DAE, in which case the estimation is in theory unbiased. We conduct theoretical and experimental analyses on the approximation error of the proposed method, as well as extensive studies on heuristics to ensure its robustness. Finally, using the proposed gradient approximator to estimate the gradient of entropy, we demonstrate state-of-the-art performance on density estimation with variational autoencoders and continuous control with soft actor-critic.

Related papers

A weak convergence approach to large deviations for stochastic approximations [0.9374652839580183]
We prove a large deviation principle for general approximations with state-dependent Markovian noise and decreasing step size. Examples of learning algorithms that are covered include gradient descent, persistent contrastive divergence and the Wang-Landau algorithm.
arXiv Detail & Related papers (2025-02-04T17:50:30Z)
Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions [65.38208224389027]
This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible. Our approach is based on the estimation of the Cumulative Sum statistics, which is known to produce optimal performance.
arXiv Detail & Related papers (2024-10-18T17:13:29Z)
Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem [0.0]
We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems. The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
arXiv Detail & Related papers (2024-04-16T13:19:46Z)
Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [0.8192907805418583]
We show that biased gradients converge to critical points for smooth non- functions. We show how the effect of bias can be reduced by appropriate tuning.
arXiv Detail & Related papers (2024-02-05T10:17:36Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Stationary Density Estimation of It\^o Diffusions Using Deep Learning [6.8342505943533345]
We consider the density estimation problem associated with the stationary measure of ergodic Ito diffusions from a discrete-time series. We employ deep neural networks to approximate the drift and diffusion terms of the SDE. We establish the convergence of the proposed scheme under appropriate mathematical assumptions.
arXiv Detail & Related papers (2021-09-09T01:57:14Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error. Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.