LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and
generative models for Vietnamese multi-document summarization
- URL: http://arxiv.org/abs/2304.05205v1
- Date: Tue, 11 Apr 2023 13:15:24 GMT
- Title: LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and
generative models for Vietnamese multi-document summarization
- Authors: Tan-Minh Nguyen, Thai-Binh Nguyen, Hoang-Trung Nguyen, Hai-Long
Nguyen, Tam Doan Thanh, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong
- Abstract summary: This paper proposes a method for multi-document summarization based on cluster similarity.
After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models.
- Score: 1.4716144941085147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-document summarization is challenging because the summaries should not
only describe the most important information from all documents but also
provide a coherent interpretation of the documents. This paper proposes a
method for multi-document summarization based on cluster similarity. In the
extractive method we use hybrid model based on a modified version of the
PageRank algorithm and a text correlation considerations mechanism. After
generating summaries by selecting the most important sentences from each
cluster, we apply BARTpho and ViT5 to construct the abstractive models. Both
extractive and abstractive approaches were considered in this study. The
proposed method achieves competitive results in VLSP 2022 competition.
Related papers
- JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization [3.992091862806936]
Our solution integrates topic discovery and summarization into a single step.
Given text data, our Joint Aspect Discovery and Summarization algorithm (JADS) discovers aspects from the input.
Our proposed method achieves higher semantic alignment with ground truth and is factual.
arXiv Detail & Related papers (2024-05-28T23:01:57Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - ACM -- Attribute Conditioning for Abstractive Multi Document
Summarization [0.0]
We propose a model that incorporates attribute conditioning modules in order to decouple conflicting information by conditioning for a certain attribute in the output summary.
This approach shows strong gains in ROUGE score over baseline multi document summarization approaches.
arXiv Detail & Related papers (2022-05-09T00:00:14Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - SgSum: Transforming Multi-document Summarization into Sub-graph
Selection [27.40759123902261]
Most existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary.
We propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem.
Our model can produce significantly more coherent and informative summaries compared with traditional MDS methods.
arXiv Detail & Related papers (2021-10-25T05:12:10Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Integrating Semantics and Neighborhood Information with Graph-Driven
Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model.
Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z) - Nutribullets Hybrid: Multi-document Health Summarization [36.95954983680022]
We present a method for generating comparative summaries that highlights similarities and contradictions in input documents.
Our framework leads to more faithful, relevant and aggregation-sensitive summarization -- while being equally fluent.
arXiv Detail & Related papers (2021-04-08T01:44:29Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.