Boosting Feedback Efficiency of Interactive Reinforcement Learning by
Adaptive Learning from Scores
- URL: http://arxiv.org/abs/2307.05405v2
- Date: Sun, 6 Aug 2023 08:33:51 GMT
- Title: Boosting Feedback Efficiency of Interactive Reinforcement Learning by
Adaptive Learning from Scores
- Authors: Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang
- Abstract summary: This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning.
We show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.
- Score: 11.702616722462139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive reinforcement learning has shown promise in learning complex
robotic tasks. However, the process can be human-intensive due to the
requirement of a large amount of interactive feedback. This paper presents a
new method that uses scores provided by humans instead of pairwise preferences
to improve the feedback efficiency of interactive reinforcement learning. Our
key insight is that scores can yield significantly more data than pairwise
preferences. Specifically, we require a teacher to interactively score the full
trajectories of an agent to train a behavioral policy in a sparse reward
environment. To avoid unstable scores given by humans negatively impacting the
training process, we propose an adaptive learning scheme. This enables the
learning paradigm to be insensitive to imperfect or unreliable scores. We
extensively evaluate our method for robotic locomotion and manipulation tasks.
The results show that the proposed method can efficiently learn near-optimal
policies by adaptive learning from scores while requiring less feedback
compared to pairwise preference learning methods. The source codes are publicly
available at https://github.com/SSKKai/Interactive-Scoring-IRL.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Multi-trainer Interactive Reinforcement Learning System [7.3072544716528345]
We propose a more effective interactive reinforcement learning system by introducing multiple trainers.
In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy.
Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
arXiv Detail & Related papers (2022-10-14T18:32:59Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Sample Efficient Social Navigation Using Inverse Reinforcement Learning [11.764601181046498]
We describe an inverse reinforcement learning based algorithm which learns from human trajectory observations without knowing their specific actions.
We show that our approach yields better performance while also decreasing training time and sample complexity.
arXiv Detail & Related papers (2021-06-18T19:07:41Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z) - Let Me At Least Learn What You Really Like: Dealing With Noisy Humans
When Learning Preferences [0.76146285961466]
We propose a modification to uncertainty sampling which uses the expected output value to help speed up learning of preferences.
We compare our approach with the uncertainty sampling baseline, as well as conduct an ablation study to test the validity of each component of our approach.
arXiv Detail & Related papers (2020-02-15T00:36:23Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.