Widening the Pipeline in Human-Guided Reinforcement Learning with
Explanation and Context-Aware Data Augmentation
- URL: http://arxiv.org/abs/2006.14804v5
- Date: Tue, 26 Oct 2021 19:16:10 GMT
- Title: Widening the Pipeline in Human-Guided Reinforcement Learning with
Explanation and Context-Aware Data Augmentation
- Authors: Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati
- Abstract summary: We present the first study of using human visual explanations in human-in-the-loop reinforcement learning.
We propose EXPAND to encourage the model to encode task-relevant features through a context-aware data augmentation.
- Score: 20.837228359591663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human explanation (e.g., in terms of feature importance) has been recently
used to extend the communication channel between human and agent in interactive
machine learning. Under this setting, human trainers provide not only the
ground truth but also some form of explanation. However, this kind of human
guidance was only investigated in supervised learning tasks, and it remains
unclear how to best incorporate this type of human knowledge into deep
reinforcement learning. In this paper, we present the first study of using
human visual explanations in human-in-the-loop reinforcement learning (HRL). We
focus on the task of learning from feedback, in which the human trainer not
only gives binary evaluative "good" or "bad" feedback for queried state-action
pairs, but also provides a visual explanation by annotating relevant features
in images. We propose EXPAND (EXPlanation AugmeNted feeDback) to encourage the
model to encode task-relevant features through a context-aware data
augmentation that only perturbs irrelevant features in human salient
information. We choose five tasks, namely Pixel-Taxi and four Atari games, to
evaluate the performance and sample efficiency of this approach. We show that
our method significantly outperforms methods leveraging human explanation that
are adapted from supervised learning, and Human-in-the-loop RL baselines that
only utilize evaluative feedback.
Related papers
- GUIDE: Real-Time Human-Shaped Agents [4.676987516944155]
We introduce GUIDE, a framework for real-time human-guided reinforcement learning.
With only 10 minutes of human feedback, our algorithm achieves up to 30% increase in success rate compared to its RL baseline.
arXiv Detail & Related papers (2024-10-19T18:59:39Z) - Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning [21.707688492630304]
HERO is an online training method that captures human feedback and provides informative learning signals for fine-tuning.
HERO can effectively handle tasks like reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.
arXiv Detail & Related papers (2024-10-07T15:12:01Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Accelerating Reinforcement Learning Agent with EEG-based Implicit Human
Feedback [10.138798960466222]
Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning.
Previous methods require human observer to give inputs explicitly, burdening the human in the loop of RL agent's learning process.
We investigate capturing human's intrinsic reactions as implicit (and natural) feedback through EEG in the form of error-related potentials (ErrP)
arXiv Detail & Related papers (2020-06-30T03:13:37Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z) - Facial Feedback for Reinforcement Learning: A Case Study and Offline
Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback.
With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback.
Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.