MRGSEM-Sum: An Unsupervised Multi-document Summarization Framework based on Multi-Relational Graphs and Structural Entropy Minimization
- URL: http://arxiv.org/abs/2507.23400v1
- Date: Thu, 31 Jul 2025 10:14:03 GMT
- Title: MRGSEM-Sum: An Unsupervised Multi-document Summarization Framework based on Multi-Relational Graphs and Structural Entropy Minimization
- Authors: Yongbing Zhang, Fang Nan, Shengxiang Gao, Yuxin Huang, Kaiwen Tan, Zhengtao Yu,
- Abstract summary: MRGSEM-Sum is an unsupervised multi-document summarization framework based on multi-relational graphs and structural entropy minimization.<n>We introduce a position-aware compression mechanism to distill each cluster, generating concise and informative summaries.
- Score: 15.596156608713347
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The core challenge faced by multi-document summarization is the complexity of relationships among documents and the presence of information redundancy. Graph clustering is an effective paradigm for addressing this issue, as it models the complex relationships among documents using graph structures and reduces information redundancy through clustering, achieving significant research progress. However, existing methods often only consider single-relational graphs and require a predefined number of clusters, which hinders their ability to fully represent rich relational information and adaptively partition sentence groups to reduce redundancy. To overcome these limitations, we propose MRGSEM-Sum, an unsupervised multi-document summarization framework based on multi-relational graphs and structural entropy minimization. Specifically, we construct a multi-relational graph that integrates semantic and discourse relations between sentences, comprehensively modeling the intricate and dynamic connections among sentences across documents. We then apply a two-dimensional structural entropy minimization algorithm for clustering, automatically determining the optimal number of clusters and effectively organizing sentences into coherent groups. Finally, we introduce a position-aware compression mechanism to distill each cluster, generating concise and informative summaries. Extensive experiments on four benchmark datasets (Multi-News, DUC-2004, PubMed, and WikiSum) demonstrate that our approach consistently outperforms previous unsupervised methods and, in several cases, achieves performance comparable to supervised models and large language models. Human evaluation demonstrates that the summaries generated by MRGSEM-Sum exhibit high consistency and coverage, approaching human-level quality.
Related papers
- CORG: Generating Answers from Complex, Interrelated Contexts [57.213304718157985]
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors.<n>Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation.<n>We introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups.
arXiv Detail & Related papers (2025-04-25T02:40:48Z) - GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization [13.61818620609812]
We propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach.
It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts.
Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches.
arXiv Detail & Related papers (2024-08-19T16:01:48Z) - One for all: A novel Dual-space Co-training baseline for Large-scale
Multi-View Clustering [42.92751228313385]
We propose a novel multi-view clustering model, named Dual-space Co-training Large-scale Multi-view Clustering (DSCMC)
The main objective of our approach is to enhance the clustering performance by leveraging co-training in two distinct spaces.
Our algorithm has an approximate linear computational complexity, which guarantees its successful application on large-scale datasets.
arXiv Detail & Related papers (2024-01-28T16:30:13Z) - Efficient Multi-View Graph Clustering with Local and Global Structure
Preservation [59.49018175496533]
We propose a novel anchor-based multi-view graph clustering framework termed Efficient Multi-View Graph Clustering with Local and Global Structure Preservation (EMVGC-LG)
Specifically, EMVGC-LG jointly optimize anchor construction and graph learning to enhance the clustering quality.
In addition, EMVGC-LG inherits the linear complexity of existing AMVGC methods respecting the sample number.
arXiv Detail & Related papers (2023-08-31T12:12:30Z) - Dual Information Enhanced Multi-view Attributed Graph Clustering [11.624319530337038]
A novel Dual Information enhanced multi-view Attributed Graph Clustering (DIAGC) method is proposed in this paper.
The proposed method introduces the Specific Information Reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views.
The Mutual Information Maximization (MIM) module maximizes the agreement between the latent high-level representation and low-level ones, and enables the high-level representation to satisfy the desired clustering structure.
arXiv Detail & Related papers (2022-11-28T01:18:04Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Fine-grained Graph Learning for Multi-view Subspace Clustering [2.4094285826152593]
We propose a fine-grained graph learning framework for multi-view subspace clustering (FGL-MSC)
The main challenge is how to optimize the fine-grained fusion weights while generating the learned graph that fits the clustering task.
Experiments on eight real-world datasets show that the proposed framework has comparable performance to the state-of-the-art methods.
arXiv Detail & Related papers (2022-01-12T18:00:29Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - Reasoning with Latent Structure Refinement for Document-Level Relation
Extraction [20.308845516900426]
We propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph.
Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED)
arXiv Detail & Related papers (2020-05-13T13:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.