InferEM: Inferring the Speaker's Intention for Empathetic Dialogue
Generation
- URL: http://arxiv.org/abs/2212.06373v7
- Date: Sun, 26 Nov 2023 17:24:19 GMT
- Title: InferEM: Inferring the Speaker's Intention for Empathetic Dialogue
Generation
- Authors: Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng
- Abstract summary: Current approaches to empathetic response generation typically encode the entire dialogue history directly.
We argue that the last utterance in the dialogue empirically conveys the intention of the speaker.
We propose a novel model named InferEM for empathetic response generation.
- Score: 37.12407597998884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current approaches to empathetic response generation typically encode the
entire dialogue history directly and put the output into a decoder to generate
friendly feedback. These methods focus on modelling contextual information but
neglect capturing the direct intention of the speaker. We argue that the last
utterance in the dialogue empirically conveys the intention of the speaker.
Consequently, we propose a novel model named InferEM for empathetic response
generation. We separately encode the last utterance and fuse it with the entire
dialogue through the multi-head attention based intention fusion module to
capture the speaker's intention. Besides, we utilize previous utterances to
predict the last utterance, which simulates human's psychology to guess what
the interlocutor may speak in advance. To balance the optimizing rates of the
utterance prediction and response generation, a multi-task learning strategy is
designed for InferEM. Experimental results demonstrate the plausibility and
validity of InferEM in improving empathetic expression.
Related papers
- SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization [48.284512017469524]
Multi-turn dialogues are characterized by their extended length and the presence of turn-taking conversations.
Traditional language models often overlook the distinct features of these dialogues by treating them as regular text.
We propose a speaker-enhanced pre-training method for long dialogue summarization.
arXiv Detail & Related papers (2024-01-31T04:50:00Z) - Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - EM Pre-training for Multi-party Dialogue Response Generation [86.25289241604199]
In multi-party dialogues, the addressee of a response utterance should be specified before it is generated.
We propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels.
arXiv Detail & Related papers (2023-05-21T09:22:41Z) - Conversational speech recognition leveraging effective fusion methods
for cross-utterance language modeling [12.153618111267514]
We put forward disparate conversation history fusion methods for language modeling in automatic speech recognition.
A novel audio-fusion mechanism is introduced, which manages to fuse and utilize the acoustic embeddings of a current utterance and the semantic content of its corresponding conversation history.
To flesh out our ideas, we frame the ASR N-best hypothesis rescoring task as a prediction problem, leveraging BERT, an iconic pre-trained LM.
arXiv Detail & Related papers (2021-11-05T09:07:23Z) - A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model
for Multi-turn Dialogue Generation [13.820298189734686]
This paper presents a novel open-domain dialogue generation model emphasizing the differentiation of speakers in multi-turn conversations.
Our empirical results show that PHAED outperforms the state-of-the-art in both automatic and human evaluations.
arXiv Detail & Related papers (2021-10-13T16:08:29Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.