Related papers: InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation

InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation

URL: http://arxiv.org/abs/2212.06373v7
Date: Sun, 26 Nov 2023 17:24:19 GMT
Title: InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation
Authors: Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng
Abstract summary: Current approaches to empathetic response generation typically encode the entire dialogue history directly. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. We propose a novel model named InferEM for empathetic response generation.
Score: 37.12407597998884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.

Related papers

Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection [24.71649541757314]
Short backchannel utterances such as "yeah" and "oh" play a crucial role in facilitating smooth and engaging dialogue. This paper proposes a novel method for real-time, continuous backchannel prediction using a fine-tuned Voice Activity Projection model.
arXiv Detail & Related papers (2024-10-21T11:57:56Z)
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance. We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information. Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z)
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization [48.284512017469524]
Multi-turn dialogues are characterized by their extended length and the presence of turn-taking conversations. Traditional language models often overlook the distinct features of these dialogues by treating them as regular text. We propose a speaker-enhanced pre-training method for long dialogue summarization.
arXiv Detail & Related papers (2024-01-31T04:50:00Z)
Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying. To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z)
EM Pre-training for Multi-party Dialogue Response Generation [86.25289241604199]
In multi-party dialogues, the addressee of a response utterance should be specified before it is generated. We propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels.
arXiv Detail & Related papers (2023-05-21T09:22:41Z)
A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model for Multi-turn Dialogue Generation [13.820298189734686]
This paper presents a novel open-domain dialogue generation model emphasizing the differentiation of speakers in multi-turn conversations. Our empirical results show that PHAED outperforms the state-of-the-art in both automatic and human evaluations.
arXiv Detail & Related papers (2021-10-13T16:08:29Z)
Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely. We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.