Related papers: How to talk so your robot will learn: Instructions, descriptions, and pragmatics

How to talk so your robot will learn: Instructions, descriptions, and pragmatics

URL: http://arxiv.org/abs/2206.07870v1
Date: Thu, 16 Jun 2022 01:33:38 GMT
Title: How to talk so your robot will learn: Instructions, descriptions, and pragmatics
Authors: Theodore R Sumers, Robert D Hawkins, Mark K Ho, Thomas L Griffiths, Dylan Hadfield-Menell
Abstract summary: We study how a human might communicate preferences over behaviors. We show that in traditional reinforcement learning settings, pragmatic social learning can integrate with and accelerate individual learning. Our findings suggest that social learning from a wider range of language is a promising approach for value alignment and reinforcement learning more broadly.
Score: 14.289220844201695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such flexible and abstract language use. To address this challenge, we consider social learning in a linear bandit setting and ask how a human might communicate preferences over behaviors (i.e. the reward function). We study two distinct types of language: instructions, which provide information about the desired policy, and descriptions, which provide information about the reward function. To explain how humans use these forms of language, we suggest they reason about both known present and unknown future states: instructions optimize for the present, while descriptions generalize to the future. We formalize this choice by extending reward design to consider a distribution over states. We then define a pragmatic listener agent that infers the speaker's reward function by reasoning about how the speaker expresses themselves. We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts spontaneous human behavior, and (2) our pragmatic listener is able to recover their reward functions. Finally, we show that in traditional reinforcement learning settings, pragmatic social learning can integrate with and accelerate individual learning. Our findings suggest that social learning from a wider range of language -- in particular, expanding the field's present focus on instructions to include learning from descriptions -- is a promising approach for value alignment and reinforcement learning more broadly.

Related papers

SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions. Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z)
Situated Instruction Following [87.37244711380411]
We propose situated instruction following, which embraces the inherent underspecification and ambiguity of real-world communication. The meaning of situated instructions naturally unfold through the past actions and the expected future behaviors of the human involved. Our experiments indicate that state-of-the-art Embodied Instruction Following (EIF) models lack holistic understanding of situated human intention.
arXiv Detail & Related papers (2024-07-15T19:32:30Z)
Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z)
The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs [50.32802502923367]
We study the process of language driving and influencing social reasoning in a probabilistic goal inference domain. We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios. Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
arXiv Detail & Related papers (2023-06-25T19:38:01Z)
Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of Mind [4.052000839878213]
We model a visually grounded referential game between a knowledgeable speaker and a listener with more limited visual and linguistic experience. We endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
arXiv Detail & Related papers (2023-05-31T15:17:28Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
Linguistic communication as (inverse) reward design [14.289220844201695]
This paper proposes a generalization of reward design as a unifying principle to ground linguistic communication. We first extend reward design to incorporate reasoning about unknown future states in a linear bandit setting. We then define a pragmatic listener which performs inverse reward design by jointly inferring the speaker's latent horizon and rewards.
arXiv Detail & Related papers (2022-04-11T13:50:34Z)
Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics [14.863872352905629]
This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions.
arXiv Detail & Related papers (2022-04-08T22:01:36Z)
Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching [27.68274308680201]
We show that adding sociolinguistically-grounded speaker features as prepended prompts significantly improves accuracy. We are the first to incorporate speaker characteristics in a neural model for code-switching.
arXiv Detail & Related papers (2022-03-16T22:56:58Z)
Learning Rewards from Linguistic Feedback [30.30912759796109]
We explore unconstrained natural language feedback as a learning signal for artificial agents. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards.
arXiv Detail & Related papers (2020-09-30T14:51:00Z)
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents [69.68400056148336]
We train a goal-oriented model with reinforcement learning against an imitation-learned chit-chat'' model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.
arXiv Detail & Related papers (2020-02-07T16:22:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.