Data-efficient visuomotor policy training using reinforcement learning
and generative models
- URL: http://arxiv.org/abs/2007.13134v2
- Date: Fri, 6 Nov 2020 17:04:12 GMT
- Title: Data-efficient visuomotor policy training using reinforcement learning
and generative models
- Authors: Ali Ghadirzadeh, Petra Poklukar, Ville Kyrki, Danica Kragic and
M{\aa}rten Bj\"orkman
- Abstract summary: We present a data-efficient framework for solving visuomotor sequential decision-making problems.
We exploit the combination of reinforcement learning and latent variable generative models.
- Score: 27.994338318811952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a data-efficient framework for solving visuomotor sequential
decision-making problems which exploits the combination of reinforcement
learning (RL) and latent variable generative models. Our framework trains deep
visuomotor policies by introducing an action latent variable such that the
feed-forward policy search can be divided into three parts: (i) training a
sub-policy that outputs a distribution over the action latent variable given a
state of the system, (ii) unsupervised training of a generative model that
outputs a sequence of motor actions conditioned on the latent action variable,
and (iii) supervised training of the deep visuomotor policy in an end-to-end
fashion. Our approach enables safe exploration and alleviates the
data-inefficiency problem as it exploits prior knowledge about valid sequences
of motor actions. Moreover, we provide a set of measures for evaluation of
generative models such that we are able to predict the performance of the RL
policy training prior to the actual training on a physical robot. We define two
novel measures of disentanglement and local linearity for assessing the quality
of latent representations, and complement them with existing measures for
assessment of the learned distribution. We experimentally determine the
characteristics of different generative models that have the most influence on
performance of the final policy training on a robotic picking task.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - For Pre-Trained Vision Models in Motor Control, Not All Policy Learning
Methods are Created Equal [17.467998596393116]
It remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies.
Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.
arXiv Detail & Related papers (2023-04-10T13:52:19Z) - Offline Reinforcement Learning via High-Fidelity Generative Behavior
Modeling [34.88897402357158]
We show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training.
We adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
Our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2022-09-29T04:36:23Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Reinforcement Learning Under Algorithmic Triage [33.80293624975863]
We develop a two-stage actor-critic method to learn reinforcement learning models under triage.
The first stage performs offline, off-policy training using human data gathered in an environment where the human has operated on their own.
The second stage performs on-policy training to account for the impact that switching may have on the human policy.
arXiv Detail & Related papers (2021-09-23T12:21:26Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.