Keep Doing What Worked: Behavioral Modelling Priors for Offline
Reinforcement Learning
- URL: http://arxiv.org/abs/2002.08396v3
- Date: Wed, 17 Jun 2020 10:12:44 GMT
- Title: Keep Doing What Worked: Behavioral Modelling Priors for Offline
Reinforcement Learning
- Authors: Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas
Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess,
Martin Riedmiller
- Abstract summary: Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set of environment interactions is available.
Standard off-policy algorithms fail in the batch setting for continuous control.
- Score: 25.099754758455415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-policy reinforcement learning algorithms promise to be applicable in
settings where only a fixed data-set (batch) of environment interactions is
available and no new experience can be acquired. This property makes these
algorithms appealing for real world problems such as robot control. In
practice, however, standard off-policy algorithms fail in the batch setting for
continuous control. In this paper, we propose a simple solution to this
problem. It admits the use of data generated by arbitrary behavior policies and
uses a learned prior -- the advantage-weighted behavior model (ABM) -- to bias
the RL policy towards actions that have previously been executed and are likely
to be successful on the new task. Our method can be seen as an extension of
recent work on batch-RL that enables stable learning from conflicting
data-sources. We find improvements on competitive baselines in a variety of RL
tasks -- including standard continuous control benchmarks and multi-task
learning for simulated and real-world robots.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces [14.029933823101084]
We propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE)
ELUE learns a belief model over the embedding space and a belief-conditional policy and Q-function.
We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.
arXiv Detail & Related papers (2021-01-06T05:51:38Z) - Overcoming Model Bias for Robust Offline Deep Reinforcement Learning [3.1325640909772403]
MOOSE is an algorithm which ensures low model bias by keeping the policy within the support of the data.
We compare MOOSE with state-of-the-art model-free, offline RL algorithms BRAC, BEAR and BCQ on the Industrial Benchmark and MuJoCo continuous control tasks in terms of robust performance.
arXiv Detail & Related papers (2020-08-12T19:08:55Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.