Linguistic communication as (inverse) reward design
- URL: http://arxiv.org/abs/2204.05091v1
- Date: Mon, 11 Apr 2022 13:50:34 GMT
- Title: Linguistic communication as (inverse) reward design
- Authors: Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L.
Griffiths, Dylan Hadfield-Menell
- Abstract summary: This paper proposes a generalization of reward design as a unifying principle to ground linguistic communication.
We first extend reward design to incorporate reasoning about unknown future states in a linear bandit setting.
We then define a pragmatic listener which performs inverse reward design by jointly inferring the speaker's latent horizon and rewards.
- Score: 14.289220844201695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language is an intuitive and expressive way to communicate reward
information to autonomous agents. It encompasses everything from concrete
instructions to abstract descriptions of the world. Despite this, natural
language is often challenging to learn from: it is difficult for machine
learning methods to make appropriate inferences from such a wide range of
input. This paper proposes a generalization of reward design as a unifying
principle to ground linguistic communication: speakers choose utterances to
maximize expected rewards from the listener's future behaviors. We first extend
reward design to incorporate reasoning about unknown future states in a linear
bandit setting. We then define a speaker model which chooses utterances
according to this objective. Simulations show that short-horizon speakers
(reasoning primarily about a single, known state) tend to use instructions,
while long-horizon speakers (reasoning primarily about unknown, future states)
tend to describe the reward function. We then define a pragmatic listener which
performs inverse reward design by jointly inferring the speaker's latent
horizon and rewards. Our findings suggest that this extension of reward design
to linguistic communication, including the notion of a latent speaker horizon,
is a promising direction for achieving more robust alignment outcomes from
natural language supervision.
Related papers
- Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - Speaking the Language of Your Listener: Audience-Aware Adaptation via
Plug-and-Play Theory of Mind [4.052000839878213]
We model a visually grounded referential game between a knowledgeable speaker and a listener with more limited visual and linguistic experience.
We endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
arXiv Detail & Related papers (2023-05-31T15:17:28Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - Know your audience: specializing grounded language models with listener
subtraction [20.857795779760917]
We take inspiration from Dixit to formulate a multi-agent image reference game.
We show that finetuning an attention-based adapter between a CLIP vision encoder and a large language model in this contrastive, multi-agent setting gives rise to context-dependent natural language specialization.
arXiv Detail & Related papers (2022-06-16T17:52:08Z) - How to talk so your robot will learn: Instructions, descriptions, and
pragmatics [14.289220844201695]
We study how a human might communicate preferences over behaviors.
We show that in traditional reinforcement learning settings, pragmatic social learning can integrate with and accelerate individual learning.
Our findings suggest that social learning from a wider range of language is a promising approach for value alignment and reinforcement learning more broadly.
arXiv Detail & Related papers (2022-06-16T01:33:38Z) - Color Overmodification Emerges from Data-Driven Learning and Pragmatic
Reasoning [53.088796874029974]
We show that speakers' referential expressions depart from communicative ideals in ways that help illuminate the nature of pragmatic language use.
By adopting neural networks as learning agents, we show that overmodification is more likely with environmental features that are infrequent or salient.
arXiv Detail & Related papers (2022-05-18T18:42:43Z) - Curriculum Learning for Goal-Oriented Semantic Communications with a
Common Language [60.85719227557608]
A holistic goal-oriented semantic communication framework is proposed to enable a speaker and a listener to cooperatively execute a set of sequential tasks.
A common language based on a hierarchical belief set is proposed to enable semantic communications between speaker and listener.
An optimization problem is defined to determine the perfect and abstract description of the events.
arXiv Detail & Related papers (2022-04-21T22:36:06Z) - Speaker Normalization for Self-supervised Speech Emotion Recognition [16.044405846513495]
We propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation.
We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
arXiv Detail & Related papers (2022-02-02T19:30:47Z) - Extending rational models of communication from beliefs to actions [10.169856458866088]
Speakers communicate to influence their partner's beliefs and shape their actions.
We develop three speaker models: a belief-oriented speaker with a purely informative objective; an action-oriented speaker with an instrumental objective; and a combined speaker which integrates the two.
We show that grounding production choices in future listener actions results in relevance effects and flexible uses of nonliteral language.
arXiv Detail & Related papers (2021-05-25T13:58:01Z) - Disentangled Speech Embeddings using Cross-modal Self-supervision [119.94362407747437]
We develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in video.
We construct a two-stream architecture which: (1) shares low-level features common to both representations; and (2) provides a natural mechanism for explicitly disentangling these factors.
arXiv Detail & Related papers (2020-02-20T14:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.