Reinforcement Learning in the Wild with Maximum Likelihood-based Model
Transfer
- URL: http://arxiv.org/abs/2302.09273v1
- Date: Sat, 18 Feb 2023 09:47:34 GMT
- Title: Reinforcement Learning in the Wild with Maximum Likelihood-based Model
Transfer
- Authors: Hannes Eriksson, Debabrota Basu, Tommy Tram, Mina Alibeigi, Christos
Dimitrakakis
- Abstract summary: We study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP.
We propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings.
We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance.
- Score: 5.92353064090273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of transferring the available Markov
Decision Process (MDP) models to learn and plan efficiently in an unknown but
similar MDP. We refer to it as \textit{Model Transfer Reinforcement Learning
(MTRL)} problem. First, we formulate MTRL for discrete MDPs and Linear
Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a
generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete
and continuous settings. In the first stage, MLEMTRL uses a \textit{constrained
Maximum Likelihood Estimation (MLE)}-based approach to estimate the target MDP
model using a set of known MDP models. In the second stage, using the estimated
target MDP model, MLEMTRL deploys a model-based planning algorithm appropriate
for the MDP class. Theoretically, we prove worst-case regret bounds for MLEMTRL
both in realisable and non-realisable settings. We empirically demonstrate that
MLEMTRL allows faster learning in new MDPs than learning from scratch and
achieves near-optimal performance depending on the similarity of the available
MDPs and the target MDP.
Related papers
- Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control.
There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging.
We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE)
VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step.
In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z) - On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement
Learning [25.163423936635787]
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems.
We propose a variant of the MAML method, named Gradient Meta-Reinforcement Learning (SG-MRL)
We derive the iteration and sample complexity of SG-MRL to find an $ilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-02-12T18:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.