Multi-trainer Interactive Reinforcement Learning System
- URL: http://arxiv.org/abs/2210.08050v1
- Date: Fri, 14 Oct 2022 18:32:59 GMT
- Title: Multi-trainer Interactive Reinforcement Learning System
- Authors: Zhaori Guo, Timothy J. Norman, and Enrico H. Gerding
- Abstract summary: We propose a more effective interactive reinforcement learning system by introducing multiple trainers.
In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy.
Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
- Score: 7.3072544716528345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive reinforcement learning can effectively facilitate the agent
training via human feedback. However, such methods often require the human
teacher to know what is the correct action that the agent should take. In other
words, if the human teacher is not always reliable, then it will not be
consistently able to guide the agent through its training. In this paper, we
propose a more effective interactive reinforcement learning system by
introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement
Learning (MTIRL), which could aggregate the binary feedback from multiple
non-perfect trainers into a more reliable reward for an agent training in a
reward-sparse environment. In particular, our trainer feedback aggregation
experiments show that our aggregation method has the best accuracy when
compared with the majority voting, the weighted voting, and the Bayesian
method. Finally, we conduct a grid-world experiment to show that the policy
trained by the MTIRL with the review model is closer to the optimal policy than
that without a review model.
Related papers
- Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach [11.740631954398292]
Pommerman is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents.
This study introduces a system designed to train multi-agent systems to play Pommerman using a combination of curriculum learning and population-based self-play.
arXiv Detail & Related papers (2024-06-30T11:14:29Z) - Direct Language Model Alignment from Online AI Feedback [78.40436231613754]
Direct alignment from preferences (DAP) methods have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF)
In this study, we posit that online feedback is key and improves DAP methods.
Our method, online AI feedback (OAIF) uses an LLM as annotator: on each training, we sample two responses from the current model and prompt the LLM annotator to choose which one is preferred, thus providing online feedback.
arXiv Detail & Related papers (2024-02-07T12:31:13Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Boosting Feedback Efficiency of Interactive Reinforcement Learning by
Adaptive Learning from Scores [11.702616722462139]
This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning.
We show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.
arXiv Detail & Related papers (2023-07-11T16:12:15Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Reinforcement Learning with Feedback from Multiple Humans with Diverse
Skills [1.433758865948252]
A promising approach to improve the robustness and exploration in Reinforcement Learning is collecting human feedback.
It is, however, often too expensive to obtain enough feedback of good quality.
We aim to rely on a group of multiple experts with different skill levels to generate enough feedback.
arXiv Detail & Related papers (2021-11-16T16:19:19Z) - A Broad-persistent Advising Approach for Deep Interactive Reinforcement
Learning in Robotic Environments [0.3683202928838613]
Deep Interactive Reinforcement Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choosing actions to speed up the learning process.
In this paper, we present Broad-persistent Advising (BPA), a broad-persistent advising approach that retains and reuses the processed information.
It not only helps trainers to give more general advice relevant to similar states instead of only the current state but also allows the agent to speed up the learning process.
arXiv Detail & Related papers (2021-10-15T10:56:00Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - Facial Feedback for Reinforcement Learning: A Case Study and Offline
Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback.
With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback.
Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.