Related papers: Unsupervised Multi-document Summarization with Holistic Inference

Unsupervised Multi-document Summarization with Holistic Inference

URL: http://arxiv.org/abs/2309.04087v1
Date: Fri, 8 Sep 2023 02:56:30 GMT
Title: Unsupervised Multi-document Summarization with Holistic Inference
Authors: Haopeng Zhang, Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Hongwei Wang, Jiawei Zhang and Dong Yu
Abstract summary: This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Subset Representative Index (SRI) balances the importance and diversity of a subset of sentences from the source documents. Our findings suggest that diversity is essential for improving multi-document summary performance.
Score: 41.58777650517525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.

Related papers

Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER) DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process. Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Supervising the Centroid Baseline for Extractive Multi-Document Summarization [2.0306707203430348]
The centroid method is a simple approach for extractive multi-document summarization. We refine it by adding a beam search process to the sentence selection and also a centroid estimation attention model that leads to improved results.
arXiv Detail & Related papers (2023-11-29T16:11:45Z)
PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization [4.6493060043204535]
We present PELMS, a pre-trained model that generates concise, fluent, and faithful summaries. We compile MultiPT, a multi-document pre-training corpus containing over 93 million documents to form more than 3 million unlabeled topic-centric document clusters. Our approach consistently outperforms competitive comparisons with respect to overall informativeness, abstractiveness, coherence, and faithfulness.
arXiv Detail & Related papers (2023-11-16T12:05:23Z)
Mining both Commonality and Specificity from Multiple Documents for Multi-Document Summarization [1.4629756274247374]
The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original documents. This paper proposes a multi-document summarization approach based on hierarchical clustering of documents.
arXiv Detail & Related papers (2023-03-05T14:25:05Z)
ACM -- Attribute Conditioning for Abstractive Multi Document Summarization [0.0]
We propose a model that incorporates attribute conditioning modules in order to decouple conflicting information by conditioning for a certain attribute in the output summary. This approach shows strong gains in ROUGE score over baseline multi document summarization approaches.
arXiv Detail & Related papers (2022-05-09T00:00:14Z)
Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z)
Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s) In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization. Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z)
PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement Learning Policies [68.8204255655161]
We propose a reinforcement learning based framework PoBRL for solving multi-document summarization. Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning. Our empirical analysis shows state-of-the-art performance on several multi-document datasets.
arXiv Detail & Related papers (2021-05-18T02:55:42Z)
SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy [92.5683788430012]
SupMMD is a novel technique for generic and update summarization based on the maximum discrepancy from kernel two-sample testing. We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.
arXiv Detail & Related papers (2020-10-06T09:26:55Z)
SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization. We convert the original documents to a sentence graph, taking both linguistic and deep representation into account. We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.