Large-Scale Multi-Document Summarization with Information Extraction and
Compression
- URL: http://arxiv.org/abs/2205.00548v1
- Date: Sun, 1 May 2022 19:49:15 GMT
- Title: Large-Scale Multi-Document Summarization with Information Extraction and
Compression
- Authors: Ning Wang, Han Liu, Diego Klabjan
- Abstract summary: We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents.
Our framework processes documents telling different stories instead of documents on the same topic.
Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.
- Score: 31.601707033466766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop an abstractive summarization framework independent of labeled data
for multiple heterogeneous documents. Unlike existing multi-document
summarization methods, our framework processes documents telling different
stories instead of documents on the same topic. We also enhance an existing
sentence fusion method with a uni-directional language model to prioritize
fused sentences with higher sentence probability with the goal of increasing
readability. Lastly, we construct a total of twelve dataset variations based on
CNN/Daily Mail and the NewsRoom datasets, where each document group contains a
large and diverse collection of documents to evaluate the performance of our
model in comparison with other baseline systems. Our experiments demonstrate
that our framework outperforms current state-of-the-art methods in this more
generic setting.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - Mining both Commonality and Specificity from Multiple Documents for
Multi-Document Summarization [1.4629756274247374]
The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents.
This paper proposes a multi-document summarization approach based on hierarchical clustering of documents.
arXiv Detail & Related papers (2023-03-05T14:25:05Z) - PDSum: Prototype-driven Continuous Summarization of Evolving
Multi-document Sets Stream [33.68263291948121]
We propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS)
We introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous summarization.
PDSum builds a lightweight prototype of each multi-document set and exploits it to adapt to new documents.
arXiv Detail & Related papers (2023-02-10T23:43:46Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s)
In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.
Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z) - Multilayer Networks for Text Analysis with Multiple Data Types [0.21108097398435335]
We propose a novel framework based on Multilayer Networks and Block Models.
We show that taking into account multiple types of information provides a more nuanced view on topic- and document-clusters.
arXiv Detail & Related papers (2021-06-30T05:47:39Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.