Backward Imitation and Forward Reinforcement Learning via Bi-directional
Model Rollouts
- URL: http://arxiv.org/abs/2208.02434v1
- Date: Thu, 4 Aug 2022 04:04:05 GMT
- Title: Backward Imitation and Forward Reinforcement Learning via Bi-directional
Model Rollouts
- Authors: Yuxin Pan and Fangzhen Lin
- Abstract summary: Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model.
In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework.
BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner.
- Score: 11.4219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional model-based reinforcement learning (RL) methods generate forward
rollout traces using the learnt dynamics model to reduce interactions with the
real environment. The recent model-based RL method considers the way to learn a
backward model that specifies the conditional probability of the previous state
given the previous action and the current state to additionally generate
backward rollout trajectories. However, in this type of model-based method, the
samples derived from backward rollouts and those from forward rollouts are
simply aggregated together to optimize the policy via the model-free RL
algorithm, which may decrease both the sample efficiency and the convergence
rate. This is because such an approach ignores the fact that backward rollout
traces are often generated starting from some high-value states and are
certainly more instructive for the agent to improve the behavior. In this
paper, we propose the backward imitation and forward reinforcement learning
(BIFRL) framework where the agent treats backward rollout traces as expert
demonstrations for the imitation of excellent behaviors, and then collects
forward rollout transitions for policy reinforcement. Consequently, BIFRL
empowers the agent to both reach to and explore from high-value states in a
more efficient manner, and further reduces the real interactions, making it
potentially more suitable for real-robot learning. Moreover, a
value-regularized generative adversarial network is introduced to augment the
valuable states which are infrequently received by the agent. Theoretically, we
provide the condition where BIFRL is superior to the baseline methods.
Experimentally, we demonstrate that BIFRL acquires the better sample efficiency
and produces the competitive asymptotic performance on various MuJoCo
locomotion tasks compared against state-of-the-art model-based methods.
Related papers
- Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control [1.5361702135159845]
This paper introduces a knowledge-informed model-based residual reinforcement learning framework.
It integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics.
We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch.
arXiv Detail & Related papers (2024-08-30T16:16:57Z) - SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning [9.88109749688605]
Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics.
This paper disentangles the problem into two key components: model bias and policy shift.
We introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL)
arXiv Detail & Related papers (2024-08-23T04:25:09Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Model-based adaptation for sample efficient transfer in reinforcement
learning control of parameter-varying systems [1.8799681615947088]
We leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning algorithms.
We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone.
arXiv Detail & Related papers (2023-05-20T10:11:09Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Double Check Your State Before Trusting It: Confidence-Aware
Bidirectional Offline Model-Based Imagination [31.805991958408438]
We propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check.
Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method.
arXiv Detail & Related papers (2022-06-16T08:00:44Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.