Continual Learning for Instruction Following from Realtime Feedback
- URL: http://arxiv.org/abs/2212.09710v2
- Date: Tue, 5 Dec 2023 21:22:11 GMT
- Title: Continual Learning for Instruction Following from Realtime Feedback
- Authors: Alane Suhr, Yoav Artzi
- Abstract summary: We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions.
During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions.
We design a contextual bandit learning approach, converting user feedback to immediate reward.
We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time.
- Score: 23.078048024461264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and deploy an approach to continually train an
instruction-following agent from feedback provided by users during
collaborative interactions. During interaction, human users instruct an agent
using natural language, and provide realtime binary feedback as they observe
the agent following their instructions. We design a contextual bandit learning
approach, converting user feedback to immediate reward. We evaluate through
thousands of human-agent interactions, demonstrating 15.4% absolute improvement
in instruction execution accuracy over time. We also show our approach is
robust to several design variations, and that the feedback signal is roughly
equivalent to the learning signal of supervised demonstration data.
Related papers
- Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Bootstrapping Adaptive Human-Machine Interfaces with Offline
Reinforcement Learning [82.91837418721182]
Adaptive interfaces can help users perform sequential decision-making tasks.
Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users.
We propose a reinforcement learning algorithm to train an interface to map raw command signals to actions.
arXiv Detail & Related papers (2023-09-07T16:52:27Z) - Using Large Language Models to Provide Explanatory Feedback to Human
Tutors [3.2507682694499582]
We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise.
This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based.
More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition.
arXiv Detail & Related papers (2023-06-27T14:19:12Z) - Continually Improving Extractive QA via Human Feedback [59.49549491725224]
We study continually improving an extractive question answering (QA) system via human user feedback.
We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time.
arXiv Detail & Related papers (2023-05-21T14:35:32Z) - Multi-trainer Interactive Reinforcement Learning System [7.3072544716528345]
We propose a more effective interactive reinforcement learning system by introducing multiple trainers.
In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy.
Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
arXiv Detail & Related papers (2022-10-14T18:32:59Z) - Reinforcement Learning with Feedback from Multiple Humans with Diverse
Skills [1.433758865948252]
A promising approach to improve the robustness and exploration in Reinforcement Learning is collecting human feedback.
It is, however, often too expensive to obtain enough feedback of good quality.
We aim to rely on a group of multiple experts with different skill levels to generate enough feedback.
arXiv Detail & Related papers (2021-11-16T16:19:19Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Assisted Perception: Optimizing Observations to Communicate State [112.40598205054994]
We aim to help users estimate the state of the world in tasks like robotic teleoperation and navigation with visual impairments.
We synthesize new observations that lead to more accurate internal state estimates when processed by the user.
arXiv Detail & Related papers (2020-08-06T19:08:05Z) - Interactive Imitation Learning in State-Space [5.672132510411464]
We propose a novel Interactive Learning technique that uses human feedback in state-space to train and improve agent behavior.
Our method titled Teaching Imitative Policies in State-space(TIPS) enables providing guidance to the agent in terms of changing its state'
arXiv Detail & Related papers (2020-08-02T17:23:54Z) - Facial Feedback for Reinforcement Learning: A Case Study and Offline
Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback.
With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback.
Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.