A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model
for Multi-turn Dialogue Generation
- URL: http://arxiv.org/abs/2110.06823v2
- Date: Thu, 14 Oct 2021 20:29:10 GMT
- Title: A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model
for Multi-turn Dialogue Generation
- Authors: Zihao Wang, Ming Jiang, Junli Wang
- Abstract summary: This paper presents a novel open-domain dialogue generation model emphasizing the differentiation of speakers in multi-turn conversations.
Our empirical results show that PHAED outperforms the state-of-the-art in both automatic and human evaluations.
- Score: 13.820298189734686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel open-domain dialogue generation model emphasizing
the differentiation of speakers in multi-turn conversations. Differing from
prior work that solely relies on the content of conversation history to
generate a response, we argue that capturing relative social relations among
utterances (i.e., generated by either the same speaker or different persons)
benefits the machine capturing fine-grained context information from a
conversation history to improve context coherence in the generated response.
Given that, we propose a speaker-aware Parallel Hierarchical Attentive
Encoder-Decoder (PHAED) model that aims to model each utterance with the
awareness of its speaker and contextual associations with the same speaker's
previous messages. Specifically, in a conversation involving two speakers, we
regard the utterances from one speaker as responses and those from the other as
queries. After understanding queries via our encoder with inner-query and
inter-query encodings, our decoder reuses the hidden states of previously
generated responses, instead of reconstructing these by the encoder, to
generate a new response. Our empirical results show that PHAED outperforms the
state-of-the-art in both automatic and human evaluations. Furthermore, our
ablation study shows that dialogue models with speaker tokens can generally
decrease the possibility of generating non-coherent responses regarding the
conversation context.
Related papers
- Multi-party Response Generation with Relation Disentanglement [8.478506896774137]
Existing neural response generation models have achieved impressive improvements for two-party conversations.
However, many real-world dialogues involve multiple interlocutors and the structure of conversational context is much more complex.
We propose to automatically infer the relations via relational thinking on subtle clues inside the conversation context without any human label.
arXiv Detail & Related papers (2024-03-16T06:33:44Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - Modeling Speaker-Listener Interaction for Backchannel Prediction [24.52345279975304]
Backchanneling theories emphasize the active and continuous role of the listener in the course of a conversation.
We propose a neural-based acoustic backchannel classifier on minimal responses by processing acoustic features from the speaker speech.
Our experimental results on the Switchboard and GECO datasets reveal that in almost all tested scenarios the speaker or listener behavior embeddings help the model make more accurate backchannel predictions.
arXiv Detail & Related papers (2023-04-10T09:22:06Z) - Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs.
We employ domain-adaptive training strategies to help the model adapt to the dialogue domains.
Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z) - InferEM: Inferring the Speaker's Intention for Empathetic Dialogue
Generation [37.12407597998884]
Current approaches to empathetic response generation typically encode the entire dialogue history directly.
We argue that the last utterance in the dialogue empirically conveys the intention of the speaker.
We propose a novel model named InferEM for empathetic response generation.
arXiv Detail & Related papers (2022-12-13T05:12:40Z) - Question-Interlocutor Scope Realized Graph Modeling over Key Utterances
for Dialogue Reading Comprehension [61.55950233402972]
We propose a new key utterances extracting method for dialogue reading comprehension.
It performs prediction on the unit formed by several contiguous utterances, which can realize more answer-contained utterances.
As a graph constructed on the text of utterances, we then propose Question-Interlocutor Scope Realized Graph (QuISG) modeling.
arXiv Detail & Related papers (2022-10-26T04:00:42Z) - Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension [43.352833140317486]
Multi-party multi-turn dialogue comprehension brings unprecedented challenges.
Most existing methods deal with dialogue contexts as plain texts.
We propose an enhanced speaker-aware model with masking attention and heterogeneous graph networks.
arXiv Detail & Related papers (2021-09-09T07:12:22Z) - Streaming Multi-talker Speech Recognition with Joint Speaker
Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification.
We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.