Related papers: Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

URL: http://arxiv.org/abs/2302.09273v1
Date: Sat, 18 Feb 2023 09:47:34 GMT
Title: Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
Authors: Hannes Eriksson, Debabrota Basu, Tommy Tram, Mina Alibeigi, Christos Dimitrakakis
Abstract summary: We study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings. We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance.
Score: 5.92353064090273
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We refer to it as \textit{Model Transfer Reinforcement Learning (MTRL)} problem. First, we formulate MTRL for discrete MDPs and Linear Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings. In the first stage, MLEMTRL uses a \textit{constrained Maximum Likelihood Estimation (MLE)}-based approach to estimate the target MDP model using a set of known MDP models. In the second stage, using the estimated target MDP model, MLEMTRL deploys a model-based planning algorithm appropriate for the MDP class. Theoretically, we prove worst-case regret bounds for MLEMTRL both in realisable and non-realisable settings. We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance depending on the similarity of the available MDPs and the target MDP.

Related papers

Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control. There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging. We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z)
Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs) In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z)
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE) VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step. In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z)
Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA) Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z)
Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ. We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments. We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z)
Meta Learning MDPs with Linear Transition Models [22.508479528847634]
We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting. We propose BUC-MatrixRL, a version of the UC-Matrix RL algorithm, and show it can meaningfully leverage a set of sampled training tasks. We prove that compared to learning the tasks in isolation, BUC-Matrix RL provides significant improvements in the transfer regret for high bias low variance task distributions.
arXiv Detail & Related papers (2022-01-21T14:57:03Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP) In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z)
MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. We present MOReL, an algorithmic framework for model-based offline RL. We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning [25.163423936635787]
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems. We propose a variant of the MAML method, named Gradient Meta-Reinforcement Learning (SG-MRL) We derive the iteration and sample complexity of SG-MRL to find an $ilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-02-12T18:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.