Related papers: Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

URL: http://arxiv.org/abs/2110.05038v1
Date: Mon, 11 Oct 2021 07:09:14 GMT
Title: Recurrent Model-Free RL is a Strong Baseline for Many POMDPs
Authors: Tianwei Ni, Benjamin Eysenbach, Ruslan Salakhutdinov
Abstract summary: Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. Prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs.
Score: 73.39666827525782
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs. This paper revisits this claim. We find that careful architecture and hyperparameter decisions yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques in their respective domains. We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs. Code is available at https://github.com/twni2016/pomdp-baselines

Related papers

Towards General-Purpose Model-Free Reinforcement Learning [40.973429772093155]
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice, RL algorithms are often tailored to specific benchmarks. We propose a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings.
arXiv Detail & Related papers (2025-01-27T15:36:37Z)
REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL. We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL. We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z)
Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ. We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments. We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. We present MOReL, an algorithmic framework for model-based offline RL. We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.