Long Document Summarization with Top-down and Bottom-up Inference
- URL: http://arxiv.org/abs/2203.07586v1
- Date: Tue, 15 Mar 2022 01:24:51 GMT
- Title: Long Document Summarization with Top-down and Bottom-up Inference
- Authors: Bo Pang, Erik Nijkamp, Wojciech Kry\'sci\'nski, Silvio Savarese,
Yingbo Zhou, Caiming Xiong
- Abstract summary: We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
- Score: 113.29319668246407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text summarization aims to condense long documents and retain key
information. Critical to the success of a summarization model is the faithful
inference of latent representations of words or tokens in the source documents.
Most recent models infer the latent representations with a transformer encoder,
which is purely bottom-up. Also, self-attention-based inference models face the
challenge of quadratic complexity with respect to sequence length. We propose a
principled inference framework to improve summarization models on these two
aspects. Our framework assumes a hierarchical latent structure of a document
where the top-level captures the long range dependency at a coarser time scale
and the bottom token level preserves the details. Critically, this hierarchical
structure enables token representations to be updated in both a bottom-up and
top-down manner. In the bottom-up pass, token representations are inferred with
local self-attention to leverage its efficiency. Top-down correction is then
applied to allow tokens to capture long-range dependency. We demonstrate the
effectiveness of the proposed framework on a diverse set of summarization
datasets, including narrative, conversational, scientific documents and news.
Our model achieves (1) competitive or better performance on short documents
with higher memory and compute efficiency, compared to full attention
transformers, and (2) state-of-the-art performance on a wide range of long
document summarization benchmarks, compared to recent efficient transformers.
We also show that our model can summarize an entire book and achieve
competitive performance using $0.27\%$ parameters (464M vs. 175B) and much less
training data, compared to a recent GPT-3-based model. These results indicate
the general applicability and benefits of the proposed framework.
Related papers
- Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - Document-Level Abstractive Summarization [0.0]
We study how efficient Transformer techniques can be used to improve the automatic summarization of very long texts.
We propose a novel retrieval-enhanced approach which reduces the cost of generating a summary of the entire document by processing smaller chunks.
arXiv Detail & Related papers (2022-12-06T14:39:09Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document
Summarization [16.830963601598242]
We propose PRIMER, a pre-trained model for multi-document representation with focus on summarization.
Specifically, we adopt the Longformer architecture with proper input transformation and global attention to fit for multi-document inputs.
Our model, PRIMER, outperforms current state-of-the-art models on most of these settings with large margins.
arXiv Detail & Related papers (2021-10-16T07:22:24Z) - Efficient Attentions for Long Document Summarization [25.234852272297598]
Hepos is a novel efficient encoder-decoder attention with head-wise positional strides.
We are able to process ten times more tokens than existing models that use full attentions.
arXiv Detail & Related papers (2021-04-05T18:45:13Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.