TANet: Thread-Aware Pretraining for Abstractive Conversational
Summarization
- URL: http://arxiv.org/abs/2204.04504v1
- Date: Sat, 9 Apr 2022 16:08:46 GMT
- Title: TANet: Thread-Aware Pretraining for Abstractive Conversational
Summarization
- Authors: Ze Yang, Liran Wang, Zhoujin Tian, Wei Wu, Zhoujun Li
- Abstract summary: We build a large-scale (11M) pretraining dataset called RCS based on the multi-person discussions in the Reddit community.
We then present TANet, a thread-aware Transformer-based network.
Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency plays an essential role in understanding the entire conversation.
- Score: 27.185068253347257
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although pre-trained language models (PLMs) have achieved great success and
become a milestone in NLP, abstractive conversational summarization remains a
challenging but less studied task. The difficulty lies in two aspects. One is
the lack of large-scale conversational summary data. Another is that applying
the existing pre-trained models to this task is tricky because of the
structural dependence within the conversation and its informal expression, etc.
In this work, we first build a large-scale (11M) pretraining dataset called
RCS, based on the multi-person discussions in the Reddit community. We then
present TANet, a thread-aware Transformer-based network. Unlike the existing
pre-trained models that treat a conversation as a sequence of sentences, we
argue that the inherent contextual dependency among the utterances plays an
essential role in understanding the entire conversation and thus propose two
new techniques to incorporate the structural information into our model. The
first is thread-aware attention which is computed by taking into account the
contextual dependency within utterances. Second, we apply thread prediction
loss to predict the relations between utterances. We evaluate our model on four
datasets of real conversations, covering types of meeting transcripts,
customer-service records, and forum threads. Experimental results demonstrate
that TANET achieves a new state-of-the-art in terms of both automatic
evaluation and human judgment.
Related papers
- Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - Conversation Disentanglement with Bi-Level Contrastive Learning [26.707584899718288]
Existing methods have two main drawbacks. First, they overemphasize pairwise utterance relations but pay inadequate attention to the utterance-to-context relation modeling.
We propose a general disentangle model based on bi-level contrastive learning. It brings closer utterances in the same session while encourages each utterance to be near its clustered session prototypes in the representation space.
arXiv Detail & Related papers (2022-10-27T08:41:46Z) - OPAL: Ontology-Aware Pretrained Language Model for End-to-End
Task-Oriented Dialogue [40.62090743056549]
This paper presents an ontology-aware pretrained language model (OPAL) for end-to-end task-oriented dialogue (TOD)
Unlike chit-chat dialogue models, task-oriented dialogue models fulfill at least two task-specific modules: dialogue state tracker (DST) and response generator (RG)
arXiv Detail & Related papers (2022-09-10T04:38:27Z) - Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations [22.894541507068933]
This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10.
We employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation.
Our approach ranks third on the objective evaluation and second on the final official human evaluation.
arXiv Detail & Related papers (2022-03-08T12:26:57Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z) - A Hierarchical Network for Abstractive Meeting Summarization with
Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario.
We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.
Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.