Bayesian Bellman Operators
- URL: http://arxiv.org/abs/2106.05012v2
- Date: Thu, 10 Jun 2021 16:29:13 GMT
- Title: Bayesian Bellman Operators
- Authors: Matthew Fellows, Kristian Hartikainen, Shimon Whiteson
- Abstract summary: We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
- Score: 55.959376449737405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel perspective on Bayesian reinforcement learning (RL);
whereas existing approaches infer a posterior over the transition distribution
or Q-function, we characterise the uncertainty in the Bellman operator. Our
Bayesian Bellman operator (BBO) framework is motivated by the insight that when
bootstrapping is introduced, model-free approaches actually infer a posterior
over Bellman operators, not value functions. In this paper, we use BBO to
provide a rigorous theoretical analysis of model-free Bayesian RL to better
understand its relationshipto established frequentist RL methodologies. We
prove that Bayesian solutions are consistent with frequentist RL solutions,
even when approximate inference isused, and derive conditions for which
convergence properties hold. Empirically, we demonstrate that algorithms
derived from the BBO framework have sophisticated deep exploration properties
that enable them to solve continuous control tasks at which state-of-the-art
regularised actor-critic algorithms fail catastrophically
Related papers
- Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - Bayesian Exploration Networks [28.885750299203433]
We introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies.
As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator.
In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable.
arXiv Detail & Related papers (2023-08-24T19:35:58Z) - Model-based Causal Bayesian Optimization [74.78486244786083]
We introduce the first algorithm for Causal Bayesian Optimization with Multiplicative Weights (CBO-MW)
We derive regret bounds for CBO-MW that naturally depend on graph-related quantities.
Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system.
arXiv Detail & Related papers (2023-07-31T13:02:36Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary
Contextual Bandits [16.59103967569845]
We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for contextual linear bandits in non-stationary environments.
This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings.
arXiv Detail & Related papers (2023-07-07T13:29:07Z) - Bayesian Risk-Averse Q-Learning with Streaming Observations [7.330349128557128]
We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment.
Observations from the real environment that is out of the agent's control arrive periodically.
We develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment.
arXiv Detail & Related papers (2023-05-18T20:48:50Z) - Model-based Causal Bayesian Optimization [78.120734120667]
We propose model-based causal Bayesian optimization (MCBO)
MCBO learns a full system model instead of only modeling intervention-reward pairs.
Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form.
arXiv Detail & Related papers (2022-11-18T14:28:21Z) - Regularization Guarantees Generalization in Bayesian Reinforcement
Learning through Algorithmic Stability [48.62272919754204]
We study generalization in Bayesian RL under the probably approximately correct (PAC) framework.
Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
arXiv Detail & Related papers (2021-09-24T07:48:34Z) - Inferential Induction: A Novel Framework for Bayesian Reinforcement
Learning [6.16852156844376]
We describe a novel framework, Inferential Induction, for correctly inferring value function distributions from data.
We experimentally demonstrate that the proposed algorithm is competitive with respect to the state of the art.
arXiv Detail & Related papers (2020-02-08T06:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.