Speaking the Language of Your Listener: Audience-Aware Adaptation via
Plug-and-Play Theory of Mind
- URL: http://arxiv.org/abs/2305.19933v1
- Date: Wed, 31 May 2023 15:17:28 GMT
- Title: Speaking the Language of Your Listener: Audience-Aware Adaptation via
Plug-and-Play Theory of Mind
- Authors: Ece Takmaz, Nicolo' Brandizzi, Mario Giulianelli, Sandro Pezzelle,
Raquel Fern\'andez
- Abstract summary: We model a visually grounded referential game between a knowledgeable speaker and a listener with more limited visual and linguistic experience.
We endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
- Score: 4.052000839878213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue participants may have varying levels of knowledge about the topic
under discussion. In such cases, it is essential for speakers to adapt their
utterances by taking their audience into account. Yet, it is an open question
how such adaptation can be modelled in computational agents. In this paper, we
model a visually grounded referential game between a knowledgeable speaker and
a listener with more limited visual and linguistic experience. Inspired by
psycholinguistic theories, we endow our speaker with the ability to adapt its
referring expressions via a simulation module that monitors the effectiveness
of planned utterances from the listener's perspective. We propose an adaptation
mechanism building on plug-and-play approaches to controlled language
generation, where utterance generation is steered on the fly by the simulator
without finetuning the speaker's underlying language model. Our results and
analyses show that our approach is effective: the speaker's utterances become
closer to the listener's domain of expertise, which leads to higher
communicative success.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction [23.115506530649988]
PerceptiveAgent is an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings.
PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language.
arXiv Detail & Related papers (2024-06-18T15:19:51Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - Joining the Conversation: Towards Language Acquisition for Ad Hoc Team
Play [1.370633147306388]
We propose and consider the problem of cooperative language acquisition as a particular form of the ad hoc team play problem.
We present a probabilistic model for inferring a speaker's intentions and a listener's semantics from observing communications between a team of language-users.
arXiv Detail & Related papers (2023-05-20T16:59:27Z) - Computational Language Acquisition with Theory of Mind [84.2267302901888]
We build language-learning agents equipped with Theory of Mind (ToM) and measure its effects on the learning process.
We find that training speakers with a highly weighted ToM listener component leads to performance gains in our image referential game setting.
arXiv Detail & Related papers (2023-03-02T18:59:46Z) - Know your audience: specializing grounded language models with listener
subtraction [20.857795779760917]
We take inspiration from Dixit to formulate a multi-agent image reference game.
We show that finetuning an attention-based adapter between a CLIP vision encoder and a large language model in this contrastive, multi-agent setting gives rise to context-dependent natural language specialization.
arXiv Detail & Related papers (2022-06-16T17:52:08Z) - Learning to Mediate Disparities Towards Pragmatic Communication [9.321336642983875]
We propose Pragmatic Rational Speaker (PRS) as a framework for building AI agents with similar abilities in language communication.
The PRS attempts to learn the speaker-listener disparity and adjust the speech accordingly, by adding a light-weighted disparity adjustment layer into working memory.
By fixing the long-term memory, the PRS only needs to update its working memory to learn and adapt to different types of listeners.
arXiv Detail & Related papers (2022-03-25T14:46:43Z) - Speaker Information Can Guide Models to Better Inductive Biases: A Case
Study On Predicting Code-Switching [27.68274308680201]
We show that adding sociolinguistically-grounded speaker features as prepended prompts significantly improves accuracy.
We are the first to incorporate speaker characteristics in a neural model for code-switching.
arXiv Detail & Related papers (2022-03-16T22:56:58Z) - Few-shot Language Coordination by Modeling Theory of Mind [95.54446989205117]
We study the task of few-shot $textitlanguage coordination$.
We require the lead agent to coordinate with a $textitpopulation$ of agents with different linguistic abilities.
This requires the ability to model the partner's beliefs, a vital component of human communication.
arXiv Detail & Related papers (2021-07-12T19:26:11Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.