Borrowing From the Future: Addressing Double Sampling in Model-free
Control
- URL: http://arxiv.org/abs/2006.06173v1
- Date: Thu, 11 Jun 2020 03:50:37 GMT
- Title: Borrowing From the Future: Addressing Double Sampling in Model-free
Control
- Authors: Yuhua Zhu, Zach Izzo, Lexing Ying
- Abstract summary: This paper extends the BFF algorithm to action-value function based model-free control.
We prove that BFF is close to unbiased SGD when the underlying dynamics vary slowly with respect to actions.
- Score: 8.282602586225833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In model-free reinforcement learning, the temporal difference method and its
variants become unstable when combined with nonlinear function approximations.
Bellman residual minimization with stochastic gradient descent (SGD) is more
stable, but it suffers from the double sampling problem: given the current
state, two independent samples for the next state are required, but often only
one sample is available. Recently, the authors of [Zhu et al, 2020] introduced
the borrowing from the future (BFF) algorithm to address this issue for the
prediction problem. The main idea is to borrow extra randomness from the future
to approximately re-sample the next state when the underlying dynamics of the
problem are sufficiently smooth. This paper extends the BFF algorithm to
action-value function based model-free control. We prove that BFF is close to
unbiased SGD when the underlying dynamics vary slowly with respect to actions.
We confirm our theoretical findings with numerical simulations.
Related papers
- Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Favour: FAst Variance Operator for Uncertainty Rating [0.034530027457862]
Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions.
By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference.
Previous work proposed propagating the first and second moments of the posterior directly through the network.
This method is even slower than sampling, so the propagated variance needs to be approximated.
Our contribution is a more principled variance propagation framework.
arXiv Detail & Related papers (2023-11-21T22:53:20Z) - Simulation-free Schr\"odinger bridges via score and flow matching [89.4231207928885]
We present simulation-free score and flow matching ([SF]$2$M)
Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous flows.
Notably, [SF]$2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks simulated data.
arXiv Detail & Related papers (2023-07-07T15:42:35Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic
Analysis For DDIM-Type Samplers [90.45898746733397]
We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling.
We show that one step along the probability flow ODE can be expressed as two steps: 1) a restoration step that runs ascent on the conditional log-likelihood at some infinitesimally previous time, and 2) a degradation step that runs the forward process using noise pointing back towards the current gradient.
arXiv Detail & Related papers (2023-03-06T18:59:19Z) - A view of mini-batch SGD via generating functions: conditions of
convergence, phase transitions, benefit from negative momenta [14.857119814202754]
Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models.
We develop a new analytic framework to analyze mini-batch SGD for linear models at different momenta and sizes of batches.
arXiv Detail & Related papers (2022-06-22T14:15:35Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.