Model-free Reinforcement Learning for Branching Markov Decision
Processes
- URL: http://arxiv.org/abs/2106.06777v1
- Date: Sat, 12 Jun 2021 13:42:15 GMT
- Title: Model-free Reinforcement Learning for Branching Markov Decision
Processes
- Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh
Trivedi, Dominik Wojtczak
- Abstract summary: We study reinforcement learning for the optimal control of Branching Markov Decision Processes.
The state of a (discrete-time) BMCs is a collection of entities that, while spawning other entities, generate a payoff.
We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit.
- Score: 6.402126624793774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study reinforcement learning for the optimal control of Branching Markov
Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov
Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities
of various types that, while spawning other entities, generate a payoff. In
comparison with BMCs, where the evolution of a each entity of the same type
follows the same probabilistic pattern, BMDPs allow an external controller to
pick from a range of options. This permits us to study the best/worst behaviour
of the system. We generalise model-free reinforcement learning techniques to
compute an optimal control strategy of an unknown BMDP in the limit. We present
results of an implementation that demonstrate the practicality of the approach.
Related papers
- Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes [42.22085862132403]
We introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a bilevel decision-making model.
BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs.
We propose a Hyper Policy Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence.
arXiv Detail & Related papers (2024-06-03T17:54:39Z) - A Deep Learning Method for Comparing Bayesian Hierarchical Models [1.6736940231069393]
We propose a deep learning method for performing Bayesian model comparison on any set of hierarchical models.
Our method enables efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application.
arXiv Detail & Related papers (2023-01-27T17:27:07Z) - Policy Gradient With Serial Markov Chain Reasoning [10.152838128195468]
We introduce a new framework that performs decision-making in reinforcement learning as an iterative reasoning process.
We show our framework has several useful properties that are inherently missing from traditional RL.
Our resulting algorithm achieves state-of-the-art performance in popular Mujoco and DeepMind Control benchmarks.
arXiv Detail & Related papers (2022-10-13T06:15:29Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - An Analysis of Model-Based Reinforcement Learning From Abstracted
Observations [24.964038353043918]
We show that abstraction can introduce a dependence between samples collected online (e.g., in the real world) and results for Model-based Reinforcement learning (MBRL)
We show that we can use concentration inequalities for martingales to overcome this problem.
We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based 'RL from Abstracted Observations'
arXiv Detail & Related papers (2022-08-30T17:19:26Z) - Efficient Reinforcement Learning in Block MDPs: A Model-free
Representation Learning Approach [73.62265030773652]
We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics.
BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy.
We show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER RL and other empirical baselines on challenging rich-observation combination lock problems.
arXiv Detail & Related papers (2022-01-31T19:47:55Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Adversarial Robustness Verification and Attack Synthesis in Stochastic
Systems [8.833548357664606]
We develop a formal framework for adversarial robustness in systems defined as discrete time Markov chains (DTMCs)
We outline a class of threat models under which adversaries can perturb system transitions, constrained by an $varepsilon$ ball around the original transition probabilities.
arXiv Detail & Related papers (2021-10-05T15:52:47Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.