Mining both Commonality and Specificity from Multiple Documents for
Multi-Document Summarization
- URL: http://arxiv.org/abs/2303.02677v1
- Date: Sun, 5 Mar 2023 14:25:05 GMT
- Title: Mining both Commonality and Specificity from Multiple Documents for
Multi-Document Summarization
- Authors: Bing Ma
- Abstract summary: The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents.
This paper proposes a multi-document summarization approach based on hierarchical clustering of documents.
- Score: 1.4629756274247374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multi-document summarization task requires the designed summarizer to
generate a short text that covers the important information of original
documents and satisfies content diversity. This paper proposes a multi-document
summarization approach based on hierarchical clustering of documents. It
utilizes the constructed class tree of documents to extract both the sentences
reflecting the commonality of all documents and the sentences reflecting the
specificity of some subclasses of these documents for generating a summary, so
as to satisfy the coverage and diversity requirements of multi-document
summarization. Comparative experiments with different variant approaches on
DUC'2002-2004 datasets prove the effectiveness of mining both the commonality
and specificity of documents for multi-document summarization. Experiments on
DUC'2004 and Multi-News datasets show that our approach achieves competitive
performance compared to the state-of-the-art unsupervised and supervised
approaches.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - Unsupervised Multi-document Summarization with Holistic Inference [41.58777650517525]
This paper proposes a new holistic framework for unsupervised multi-document extractive summarization.
Subset Representative Index (SRI) balances the importance and diversity of a subset of sentences from the source documents.
Our findings suggest that diversity is essential for improving multi-document summary performance.
arXiv Detail & Related papers (2023-09-08T02:56:30Z) - PDSum: Prototype-driven Continuous Summarization of Evolving
Multi-document Sets Stream [33.68263291948121]
We propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS)
We introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous summarization.
PDSum builds a lightweight prototype of each multi-document set and exploits it to adapt to new documents.
arXiv Detail & Related papers (2023-02-10T23:43:46Z) - Large-Scale Multi-Document Summarization with Information Extraction and
Compression [31.601707033466766]
We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents.
Our framework processes documents telling different stories instead of documents on the same topic.
Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.
arXiv Detail & Related papers (2022-05-01T19:49:15Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s)
In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.
Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.