Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents
- URL: http://arxiv.org/abs/2407.01824v1
- Date: Mon, 1 Jul 2024 21:46:30 GMT
- Title: Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents
- Authors: Mehdi Arjmand, Farnaz Nouraei, Ian Steenstra, Timothy Bickmore,
- Abstract summary: Empathic grounding is required whenever the speaker's emotions are foregrounded.
We describe a model that takes as input user speech and facial expression to generate multimodal grounding moves for a listening agent.
Our work highlights the role of emotion awareness and multimodality in generating appropriate grounding moves for conversational agents.
- Score: 0.6990493129893112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the concept of "empathic grounding" in conversational agents as an extension of Clark's conceptualization of grounding in conversation in which the grounding criterion includes listener empathy for the speaker's affective state. Empathic grounding is generally required whenever the speaker's emotions are foregrounded and can make the grounding process more efficient and reliable by communicating both propositional and affective understanding. Both speaker expressions of affect and listener empathic grounding can be multimodal, including facial expressions and other nonverbal displays. Thus, models of empathic grounding for embodied agents should be multimodal to facilitate natural and efficient communication. We describe a multimodal model that takes as input user speech and facial expression to generate multimodal grounding moves for a listening agent using a large language model. We also describe a testbed to evaluate approaches to empathic grounding, in which a humanoid robot interviews a user about a past episode of pain and then has the user rate their perception of the robot's empathy. We compare our proposed model to one that only generates non-affective grounding cues in a between-subjects experiment. Findings demonstrate that empathic grounding increases user perceptions of empathy, understanding, emotional intelligence, and trust. Our work highlights the role of emotion awareness and multimodality in generating appropriate grounding moves for conversational agents.
Related papers
- Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction [23.115506530649988]
PerceptiveAgent is an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings.
PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language.
arXiv Detail & Related papers (2024-06-18T15:19:51Z) - Grounding Gaps in Language Model Generations [67.79817087930678]
We study whether large language models generate text that reflects human grounding.
We find that -- compared to humans -- LLMs generate language with less conversational grounding.
To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization.
arXiv Detail & Related papers (2023-11-15T17:40:27Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Know your audience: specializing grounded language models with listener
subtraction [20.857795779760917]
We take inspiration from Dixit to formulate a multi-agent image reference game.
We show that finetuning an attention-based adapter between a CLIP vision encoder and a large language model in this contrastive, multi-agent setting gives rise to context-dependent natural language specialization.
arXiv Detail & Related papers (2022-06-16T17:52:08Z) - CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset
for Conversational AI [48.67259855309959]
Most existing datasets for conversational AI ignore human personalities and emotions.
We propose CPED, a large-scale Chinese personalized and emotional dialogue dataset.
CPED contains more than 12K dialogues of 392 speakers from 40 TV shows.
arXiv Detail & Related papers (2022-05-29T17:45:12Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Responsive Listening Head Generation: A Benchmark Dataset and Baseline [58.168958284290156]
We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs.
Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields.
arXiv Detail & Related papers (2021-12-27T07:18:50Z) - Constructing Emotion Consensus and Utilizing Unpaired Data for
Empathetic Dialogue Generation [22.2430593119389]
We propose a dual-generative model, Dual-Emp, to simultaneously construct the emotion consensus and utilize some external unpaired data.
Our method outperforms competitive baselines in producing coherent and empathetic responses.
arXiv Detail & Related papers (2021-09-16T07:57:01Z) - Few-shot Language Coordination by Modeling Theory of Mind [95.54446989205117]
We study the task of few-shot $textitlanguage coordination$.
We require the lead agent to coordinate with a $textitpopulation$ of agents with different linguistic abilities.
This requires the ability to model the partner's beliefs, a vital component of human communication.
arXiv Detail & Related papers (2021-07-12T19:26:11Z) - Exemplars-guided Empathetic Response Generation Controlled by the
Elements of Human Communication [88.52901763928045]
We propose an approach that relies on exemplars to cue the generative model on fine stylistic properties that signal empathy to the interlocutor.
We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics.
arXiv Detail & Related papers (2021-06-22T14:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.