PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization
- URL: http://arxiv.org/abs/2311.09836v1
- Date: Thu, 16 Nov 2023 12:05:23 GMT
- Title: PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization
- Authors: Joseph J. Peper, Wenzhao Qiu, Lu Wang
- Abstract summary: We present PELMS, a pre-trained model that generates concise, fluent, and faithful summaries.
We compile MultiPT, a multi-document pre-training corpus containing over 93 million documents to form more than 3 million unlabeled topic-centric document clusters.
Our approach consistently outperforms competitive comparisons with respect to overall informativeness, abstractiveness, coherence, and faithfulness.
- Score: 4.6493060043204535
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We investigate pre-training techniques for abstractive multi-document
summarization (MDS), which is much less studied than summarizing single
documents. Though recent work has demonstrated the effectiveness of
highlighting information salience for pre-training strategy design, it
struggles to generate abstractive and reflective summaries, which are critical
properties for MDS. To this end, we present PELMS, a pre-trained model that
uses objectives based on semantic coherence heuristics and faithfulness
constraints with un-labeled multi-document inputs, to promote the generation of
concise, fluent, and faithful summaries. To support the training of PELMS, we
compile MultiPT, a multi-document pre-training corpus containing over 93
million documents to form more than 3 million unlabeled topic-centric document
clusters, covering diverse genres such as product reviews, news, and general
knowledge. We perform extensive evaluation of PELMS in low-shot settings on a
wide range of MDS datasets. Our approach consistently outperforms competitive
comparisons with respect to overall informativeness, abstractiveness,
coherence, and faithfulness.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.