An Information-Theoretic Analysis of Bayesian Reinforcement Learning
- URL: http://arxiv.org/abs/2207.08735v1
- Date: Mon, 18 Jul 2022 16:28:01 GMT
- Title: An Information-Theoretic Analysis of Bayesian Reinforcement Learning
- Authors: Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering,
and Mikael Skoglund
- Abstract summary: We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent.
We show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy.
- Score: 44.025369660607645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building on the framework introduced by Xu and Raginksy [1] for supervised
learning problems, we study the best achievable performance for model-based
Bayesian reinforcement learning problems. With this purpose, we define minimum
Bayesian regret (MBR) as the difference between the maximum expected cumulative
reward obtainable either by learning from the collected data or by knowing the
environment and its dynamics. We specialize this definition to reinforcement
learning problems modeled as Markov decision processes (MDPs) whose kernel
parameters are unknown to the agent and whose uncertainty is expressed by a
prior distribution. One method for deriving upper bounds on the MBR is
presented and specific bounds based on the relative entropy and the Wasserstein
distance are given. We then focus on two particular cases of MDPs, the
multi-armed bandit problem (MAB) and the online optimization with partial
feedback problem. For the latter problem, we show that our bounds can recover
from below the current information-theoretic bounds by Russo and Van Roy [2].
Related papers
- On Pareto Optimality for the Multinomial Logistic Bandit [0.0]
We provide a new online learning algorithm for tackling the Multinomial Logit Bandit problem.
Despite the challenges posed by the MNL model, we develop a novel Upper Confidence Bound (UCB)-based method.
We develop theoretical guarantees characterizing the tradeoff between regret and estimation error for the MNL-Bandit problem.
arXiv Detail & Related papers (2025-01-31T16:42:29Z) - Learning Algorithms for Verification of Markov Decision Processes [20.5951492453299]
We present a general framework for applying learning algorithms to the verification of Markov decision processes (MDPs)
The presented framework focuses on probabilistic reachability, which is a core problem in verification.
arXiv Detail & Related papers (2024-03-14T08:54:19Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Parameterized MDPs and Reinforcement Learning Problems -- A Maximum
Entropy Principle Based Framework [2.741266294612776]
We present a framework to address a class of sequential decision making problems.
Our framework features learning the optimal control policy with robustness to noisy data.
arXiv Detail & Related papers (2020-06-17T04:08:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.