Facial Feedback for Reinforcement Learning: A Case Study and Offline
Analysis Using the TAMER Framework
- URL: http://arxiv.org/abs/2001.08703v1
- Date: Thu, 23 Jan 2020 17:50:57 GMT
- Title: Facial Feedback for Reinforcement Learning: A Case Study and Offline
Analysis Using the TAMER Framework
- Authors: Guangliang Li, Hamdi Dibeklio\u{g}lu, Shimon Whiteson and Hayley Hung
- Abstract summary: We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback.
With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback.
Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
- Score: 51.237191651923666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive reinforcement learning provides a way for agents to learn to
solve tasks from evaluative feedback provided by a human user. Previous
research showed that humans give copious feedback early in training but very
sparsely thereafter. In this article, we investigate the potential of agent
learning from trainers' facial expressions via interpreting them as evaluative
feedback. To do so, we implemented TAMER which is a popular interactive
reinforcement learning method in a reinforcement-learning benchmark problem ---
Infinite Mario, and conducted the first large-scale study of TAMER involving
561 participants. With designed CNN-RNN model, our analysis shows that telling
trainers to use facial expressions and competition can improve the accuracies
for estimating positive and negative feedback using facial expressions. In
addition, our results with a simulation experiment show that learning solely
from predicted feedback based on facial expressions is possible and using
strong/effective prediction models or a regression method, facial responses
would significantly improve the performance of agents. Furthermore, our
experiment supports previous studies demonstrating the importance of
bi-directional feedback and competitive elements in the training interface.
Related papers
- Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models [8.025808955214957]
This paper studies the advantages and limitations of reinforcement learning from large language model feedback.
We propose a simple yet effective method for soliciting and applying feedback as a potential-based shaping function.
arXiv Detail & Related papers (2024-10-22T19:52:08Z) - GUIDE: Real-Time Human-Shaped Agents [4.676987516944155]
We introduce GUIDE, a framework for real-time human-guided reinforcement learning.
With only 10 minutes of human feedback, our algorithm achieves up to 30% increase in success rate compared to its RL baseline.
arXiv Detail & Related papers (2024-10-19T18:59:39Z) - Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning [21.707688492630304]
HERO is an online training method that captures human feedback and provides informative learning signals for fine-tuning.
HERO can effectively handle tasks like reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.
arXiv Detail & Related papers (2024-10-07T15:12:01Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - Using Large Language Models to Provide Explanatory Feedback to Human
Tutors [3.2507682694499582]
We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise.
This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based.
More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition.
arXiv Detail & Related papers (2023-06-27T14:19:12Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Widening the Pipeline in Human-Guided Reinforcement Learning with
Explanation and Context-Aware Data Augmentation [20.837228359591663]
We present the first study of using human visual explanations in human-in-the-loop reinforcement learning.
We propose EXPAND to encourage the model to encode task-relevant features through a context-aware data augmentation.
arXiv Detail & Related papers (2020-06-26T05:40:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.