Related papers: Learning Rewards from Linguistic Feedback

Learning Rewards from Linguistic Feedback

URL: http://arxiv.org/abs/2009.14715v3
Date: Sat, 3 Jul 2021 19:03:12 GMT
Title: Learning Rewards from Linguistic Feedback
Authors: Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths
Abstract summary: We explore unconstrained natural language feedback as a learning signal for artificial agents. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards.
Score: 30.30912759796109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, using aspect-based sentiment analysis to decompose feedback into sentiment about the features of a Markov decision process. We then perform an analogue of inverse reinforcement learning, regressing the sentiment on the features to infer the teacher's latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards. We then repeat our initial experiment and pair them with human teachers. All three successfully learn from interactive human feedback. The sentiment models outperform the inference network, with the "pragmatic" model approaching human performance. Our work thus provides insight into the information structure of naturalistic linguistic feedback as well as methods to leverage it for reinforcement learning.

Related papers

Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning [54.31495290436766]
We extend BabyAI to automatically generate language feedback from environment dynamics and goal condition success. We modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance.
arXiv Detail & Related papers (2023-12-07T22:33:34Z)
Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers [15.211628096103475]
We present an initial study that evaluates intra-episodic feedback given in a collaborative setting. Our results show that intra-episodic feedback allows the follower to generalize on aspects of scene complexity.
arXiv Detail & Related papers (2023-05-22T10:01:15Z)
Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off [3.631024220680066]
We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language. We succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents.
arXiv Detail & Related papers (2023-01-30T17:22:33Z)
How to talk so your robot will learn: Instructions, descriptions, and pragmatics [14.289220844201695]
We study how a human might communicate preferences over behaviors. We show that in traditional reinforcement learning settings, pragmatic social learning can integrate with and accelerate individual learning. Our findings suggest that social learning from a wider range of language is a promising approach for value alignment and reinforcement learning more broadly.
arXiv Detail & Related papers (2022-06-16T01:33:38Z)
Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z)
Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics [14.863872352905629]
This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions.
arXiv Detail & Related papers (2022-04-08T22:01:36Z)
Unsupervised Domain Adaptive Person Re-Identification via Human Learning Imitation [67.52229938775294]
In past years, researchers propose to utilize the teacher-student framework in their methods to decrease the domain gap between different person re-identification datasets. Inspired by recent teacher-student framework based methods, we propose to conduct further exploration to imitate the human learning process from different aspects.
arXiv Detail & Related papers (2021-11-28T01:14:29Z)
Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning [78.13740873213223]
Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. We propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
arXiv Detail & Related papers (2020-10-02T03:19:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.