PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document
Summarization
- URL: http://arxiv.org/abs/2110.08499v1
- Date: Sat, 16 Oct 2021 07:22:24 GMT
- Title: PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document
Summarization
- Authors: Wen Xiao, Iz Beltagy, Giuseppe Carenini, Arman Cohan
- Abstract summary: We propose PRIMER, a pre-trained model for multi-document representation with focus on summarization.
Specifically, we adopt the Longformer architecture with proper input transformation and global attention to fit for multi-document inputs.
Our model, PRIMER, outperforms current state-of-the-art models on most of these settings with large margins.
- Score: 16.830963601598242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently proposed pre-trained generation models achieve strong performance on
single-document summarization benchmarks. However, most of them are pre-trained
with general-purpose objectives and mainly aim to process single document
inputs. In this paper, we propose PRIMER, a pre-trained model for
multi-document representation with focus on summarization that reduces the need
for dataset-specific architectures and large amounts of fine-tuning labeled
data. Specifically, we adopt the Longformer architecture with proper input
transformation and global attention to fit for multi-document inputs, and we
use Gap Sentence Generation objective with a new strategy to select salient
sentences for the whole cluster, called Entity Pyramid, to teach the model to
select and aggregate information across a cluster of related documents. With
extensive experiments on 6 multi-document summarization datasets from 3
different domains on the zero-shot, few-shot, and full-supervised settings, our
model, PRIMER, outperforms current state-of-the-art models on most of these
settings with large margins. Code and pre-trained models are released at
https://github.com/allenai/PRIMER
Related papers
- Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - LMGQS: A Large-scale Dataset for Query-focused Summarization [77.6179359525065]
We convert four generic summarization benchmarks into a new QFS benchmark dataset, LMGQS.
We establish baselines with state-of-the-art summarization models.
We achieve state-of-the-art zero-shot and supervised performance on multiple existing QFS benchmarks.
arXiv Detail & Related papers (2023-05-22T14:53:45Z) - Multi-Document Summarization with Centroid-Based Pretraining [35.8335939654861]
In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary.
We introduce a novel pretraining objective, which involves selecting the ROUGE-based centroid of each document cluster as a proxy for its summary.
Our objective thus does not require human written summaries and can be utilized for pretraining on a dataset consisting solely of document sets.
arXiv Detail & Related papers (2022-08-01T17:28:02Z) - Large-Scale Multi-Document Summarization with Information Extraction and
Compression [31.601707033466766]
We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents.
Our framework processes documents telling different stories instead of documents on the same topic.
Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.
arXiv Detail & Related papers (2022-05-01T19:49:15Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.