InferEM: Inferring the Speaker's Intention for Empathetic Dialogue
Generation
- URL: http://arxiv.org/abs/2212.06373v7
- Date: Sun, 26 Nov 2023 17:24:19 GMT
- Title: InferEM: Inferring the Speaker's Intention for Empathetic Dialogue
Generation
- Authors: Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng
- Abstract summary: Current approaches to empathetic response generation typically encode the entire dialogue history directly.
We argue that the last utterance in the dialogue empirically conveys the intention of the speaker.
We propose a novel model named InferEM for empathetic response generation.
- Score: 37.12407597998884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current approaches to empathetic response generation typically encode the
entire dialogue history directly and put the output into a decoder to generate
friendly feedback. These methods focus on modelling contextual information but
neglect capturing the direct intention of the speaker. We argue that the last
utterance in the dialogue empirically conveys the intention of the speaker.
Consequently, we propose a novel model named InferEM for empathetic response
generation. We separately encode the last utterance and fuse it with the entire
dialogue through the multi-head attention based intention fusion module to
capture the speaker's intention. Besides, we utilize previous utterances to
predict the last utterance, which simulates human's psychology to guess what
the interlocutor may speak in advance. To balance the optimizing rates of the
utterance prediction and response generation, a multi-task learning strategy is
designed for InferEM. Experimental results demonstrate the plausibility and
validity of InferEM in improving empathetic expression.
Related papers
- Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection [24.71649541757314]
Short backchannel utterances such as "yeah" and "oh" play a crucial role in facilitating smooth and engaging dialogue.
This paper proposes a novel method for real-time, continuous backchannel prediction using a fine-tuned Voice Activity Projection model.
arXiv Detail & Related papers (2024-10-21T11:57:56Z) - Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization [48.284512017469524]
Multi-turn dialogues are characterized by their extended length and the presence of turn-taking conversations.
Traditional language models often overlook the distinct features of these dialogues by treating them as regular text.
We propose a speaker-enhanced pre-training method for long dialogue summarization.
arXiv Detail & Related papers (2024-01-31T04:50:00Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - EM Pre-training for Multi-party Dialogue Response Generation [86.25289241604199]
In multi-party dialogues, the addressee of a response utterance should be specified before it is generated.
We propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels.
arXiv Detail & Related papers (2023-05-21T09:22:41Z) - A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model
for Multi-turn Dialogue Generation [13.820298189734686]
This paper presents a novel open-domain dialogue generation model emphasizing the differentiation of speakers in multi-turn conversations.
Our empirical results show that PHAED outperforms the state-of-the-art in both automatic and human evaluations.
arXiv Detail & Related papers (2021-10-13T16:08:29Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.