Efficient Attentions for Long Document Summarization
- URL: http://arxiv.org/abs/2104.02112v1
- Date: Mon, 5 Apr 2021 18:45:13 GMT
- Title: Efficient Attentions for Long Document Summarization
- Authors: Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji and Lu Wang
- Abstract summary: Hepos is a novel efficient encoder-decoder attention with head-wise positional strides.
We are able to process ten times more tokens than existing models that use full attentions.
- Score: 25.234852272297598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The quadratic computational and memory complexities of large Transformers
have limited their scalability for long document summarization. In this paper,
we propose Hepos, a novel efficient encoder-decoder attention with head-wise
positional strides to effectively pinpoint salient information from the source.
We further conduct a systematic study of existing efficient self-attentions.
Combined with Hepos, we are able to process ten times more tokens than existing
models that use full attentions. For evaluation, we present a new dataset,
GovReport, with significantly longer documents and summaries. Results show that
our models produce significantly higher ROUGE scores than competitive
comparisons, including new state-of-the-art results on PubMed. Human evaluation
also shows that our models generate more informative summaries with fewer
unfaithful errors.
Related papers
- DocMamba: Efficient Document Pre-training with State Space Model [56.84200017560988]
We present DocMamba, a novel framework based on the state space model.
It is designed to reduce computational complexity to linear while preserving global modeling capabilities.
Experiments on the HRDoc confirm DocMamba's potential for length extrapolation.
arXiv Detail & Related papers (2024-09-18T11:34:28Z) - Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - How Far are We from Robust Long Abstractive Summarization? [39.34743996451813]
We evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries.
For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary.
We release our annotated long document dataset with the hope that it can contribute to the development of metrics across a broader range of summarization settings.
arXiv Detail & Related papers (2022-10-30T03:19:50Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Enhancing Extractive Text Summarization with Topic-Aware Graph Neural
Networks [21.379555672973975]
This paper proposes a graph neural network (GNN)-based extractive summarization model.
Our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection.
The experimental results demonstrate that our model achieves substantially state-of-the-art results on CNN/DM and NYT datasets.
arXiv Detail & Related papers (2020-10-13T09:30:04Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.