Posterior Coreset Construction with Kernelized Stein Discrepancy for
Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2206.01162v2
- Date: Thu, 4 May 2023 05:25:56 GMT
- Title: Posterior Coreset Construction with Kernelized Stein Discrepancy for
Model-Based Reinforcement Learning
- Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler,
Furong Huang, Pratap Tokekar, Dinesh Manocha
- Abstract summary: We develop a novel model-based approach to reinforcement learning (MBRL)
It relaxes the assumptions on the target transition model to belong to a generic family of mixture models.
It can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
- Score: 78.30395044401321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based approaches to reinforcement learning (MBRL) exhibit favorable
performance in practice, but their theoretical guarantees in large spaces are
mostly restricted to the setting when transition model is Gaussian or
Lipschitz, and demands a posterior estimate whose representational complexity
grows unbounded with time. In this work, we develop a novel MBRL method (i)
which relaxes the assumptions on the target transition model to belong to a
generic family of mixture models; (ii) is applicable to large-scale training by
incorporating a compression step such that the posterior estimate consists of a
Bayesian coreset of only statistically significant past state-action pairs; and
(iii) exhibits a sublinear Bayesian regret. To achieve these results, we adopt
an approach based upon Stein's method, which, under a smoothness condition on
the constructed posterior and target, allows distributional distance to be
evaluated in closed form as the kernelized Stein discrepancy (KSD). The
aforementioned compression step is then computed in terms of greedily retaining
only those samples which are more than a certain KSD away from the previous
model estimate. Experimentally, we observe that this approach is competitive
with several state-of-the-art RL methodologies, and can achieve up-to 50
percent reduction in wall clock time in some continuous control environments.
Related papers
- General bounds on the quality of Bayesian coresets [13.497835690074151]
This work presents general upper and lower bounds on the Kullback-Leibler (KL)
Lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations.
The upper bounds are used to analyze the performance of recent subsample-optimize methods.
arXiv Detail & Related papers (2024-05-20T04:46:14Z) - Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders [22.77397537980102]
We show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior.
We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines.
arXiv Detail & Related papers (2024-03-13T20:16:21Z) - One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Bayesian Pseudo-Coresets via Contrastive Divergence [5.479797073162603]
We introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence.
It eliminates the need for approximations in the pseudo-coreset construction process.
We conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques.
arXiv Detail & Related papers (2023-03-20T17:13:50Z) - Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature [13.36200518068162]
We propose the framework of post-process Bayesian inference as a means to obtain a quick posterior approximation from existing target density evaluations.
Within this framework, we introduce Variational Sparse Bayesian Quadrature (VSBQ), a method for post-process approximate inference for models with black-box and potentially noisy likelihoods.
We validate our method on challenging synthetic scenarios and real-world applications from computational neuroscience.
arXiv Detail & Related papers (2023-03-09T13:58:35Z) - Fast Estimation of Bayesian State Space Models Using Amortized
Simulation-Based Inference [0.0]
This paper presents a fast algorithm for estimating hidden states of Bayesian state space models.
After pretraining, finding the posterior distribution for any dataset takes from hundredths to tenths of a second.
arXiv Detail & Related papers (2022-10-13T16:37:05Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.