Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models
- URL: http://arxiv.org/abs/2204.08573v1
- Date: Mon, 18 Apr 2022 22:02:32 GMT
- Title: Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models
- Authors: Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville
Kyrki, Danica Kragic and M{\aa}rten Bj\"orkman
- Abstract summary: GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
- Score: 67.78935378952146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a data-efficient framework for solving sequential decision-making
problems which exploits the combination of reinforcement learning (RL) and
latent variable generative models. The framework, called GenRL, trains deep
policies by introducing an action latent variable such that the feed-forward
policy search can be divided into two parts: (i) training a sub-policy that
outputs a distribution over the action latent variable given a state of the
system, and (ii) unsupervised training of a generative model that outputs a
sequence of motor actions conditioned on the latent action variable. GenRL
enables safe exploration and alleviates the data-inefficiency problem as it
exploits prior knowledge about valid sequences of motor actions. Moreover, we
provide a set of measures for evaluation of generative models such that we are
able to predict the performance of the RL policy training prior to the actual
training on a physical robot. We experimentally determine the characteristics
of generative models that have most influence on the performance of the final
policy training on two robotics tasks: shooting a hockey puck and throwing a
basketball. Furthermore, we empirically demonstrate that GenRL is the only
method which can safely and efficiently solve the robotics tasks compared to
two state-of-the-art RL methods.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains.
Recent approaches involve pre-training transformer models on vast amounts of unlabeled data.
In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Offline Reinforcement Learning via High-Fidelity Generative Behavior
Modeling [34.88897402357158]
We show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training.
We adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
Our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2022-09-29T04:36:23Z) - Data-efficient visuomotor policy training using reinforcement learning
and generative models [27.994338318811952]
We present a data-efficient framework for solving visuomotor sequential decision-making problems.
We exploit the combination of reinforcement learning and latent variable generative models.
arXiv Detail & Related papers (2020-07-26T14:19:00Z) - Stealing Deep Reinforcement Learning Models for Fun and Profit [33.64948529132546]
This paper presents the first model extraction attack against Deep Reinforcement Learning (DRL)
It enables an external adversary to precisely recover a black-box DRL model only from its interaction with the environment.
arXiv Detail & Related papers (2020-06-09T03:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.