Related papers: Semi-supervised reward learning for offline reinforcement learning

Semi-supervised reward learning for offline reinforcement learning

URL: http://arxiv.org/abs/2012.06899v1
Date: Sat, 12 Dec 2020 20:06:15 GMT
Title: Semi-supervised reward learning for offline reinforcement learning
Authors: Ksenia Konyushkova, Konrad Zolna, Yusuf Aytar, Alexander Novikov, Scott Reed, Serkan Cabi, Nando de Freitas
Abstract summary: Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
Score: 71.6909757718301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engineering is challenging and laborious. To overcome this, we investigate reward learning under the constraint of minimizing human reward annotations. We consider two types of supervision: timestep annotations and demonstrations. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards. We further investigate the relationship between the quality of the reward model and the final policies. We notice, for example, that the reward models do not need to be perfect to result in useful policies.

Related papers

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning [1.607353805620917]
We propose a framework that can effectively utilize expert demonstrations, even when they are incomplete and imperfect.<n>We employ a Mixturecoder Autoen Experts to capture a diverse range of behaviors and missing information in demonstrations.
arXiv Detail & Related papers (2025-07-21T06:38:46Z)
Offline Reinforcement Learning with Imputed Rewards [8.856568375969848]
We propose a Reward Model that can estimate the reward signal from a very limited sample of environment transitions annotated with rewards. Our results show that, using only 1% of reward-labeled transitions from the original datasets, our learned reward model is able to impute rewards for the remaining 99% of the transitions.
arXiv Detail & Related papers (2024-07-15T15:53:13Z)
Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task. We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z)
Reward Shaping for Happier Autonomous Cyber Security Agents [0.276240219662896]
One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task.
arXiv Detail & Related papers (2023-10-20T15:04:42Z)
Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control [9.118706387430883]
Reinforcement learning (RL) has recently proven great success in various domains. Yet, the design of the reward function requires detailed domain expertise and tedious fine-tuning to ensure that agents are able to learn the desired behaviour. We propose to use model predictive control(MPC) as an experience source for training RL agents in sparse reward environments.
arXiv Detail & Related papers (2022-10-04T11:06:38Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks [70.56451186797436]
We study how to use meta-reinforcement learning to solve the bulk of the problem in simulation. We demonstrate our approach by training an agent to successfully perform challenging real-world insertion tasks.
arXiv Detail & Related papers (2020-04-29T18:00:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.