Related papers: Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

URL: http://arxiv.org/abs/2204.08573v1
Date: Mon, 18 Apr 2022 22:02:32 GMT
Title: Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models
Authors: Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville Kyrki, Danica Kragic and M{\aa}rten Bj\"orkman
Abstract summary: GenRL is a framework for solving sequential decision-making problems. It exploits the combination of reinforcement learning and latent variable generative models. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
Score: 67.78935378952146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

Related papers

Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks. We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration. Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z)
Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms. We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data. In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z)
Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling [34.88897402357158]
We show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training. We adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model. Our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2022-09-29T04:36:23Z)
Data-efficient visuomotor policy training using reinforcement learning and generative models [27.994338318811952]
We present a data-efficient framework for solving visuomotor sequential decision-making problems. We exploit the combination of reinforcement learning and latent variable generative models.
arXiv Detail & Related papers (2020-07-26T14:19:00Z)
Stealing Deep Reinforcement Learning Models for Fun and Profit [33.64948529132546]
This paper presents the first model extraction attack against Deep Reinforcement Learning (DRL) It enables an external adversary to precisely recover a black-box DRL model only from its interaction with the environment.
arXiv Detail & Related papers (2020-06-09T03:24:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.