BERT-VBD: Vietnamese Multi-Document Summarization Framework
- URL: http://arxiv.org/abs/2409.12134v1
- Date: Wed, 18 Sep 2024 16:56:06 GMT
- Title: BERT-VBD: Vietnamese Multi-Document Summarization Framework
- Authors: Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong,
- Abstract summary: An emerging and promising strategy involves a synergistic fusion of extractive and abstractive summarization methods.
This paper presents a novel Vietnamese MDS framework leveraging a two-component pipeline architecture.
The proposed framework attains ROUGE-2 scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art baselines.
- Score: 2.2526595080231857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In tackling the challenge of Multi-Document Summarization (MDS), numerous methods have been proposed, spanning both extractive and abstractive summarization techniques. However, each approach has its own limitations, making it less effective to rely solely on either one. An emerging and promising strategy involves a synergistic fusion of extractive and abstractive summarization methods. Despite the plethora of studies in this domain, research on the combined methodology remains scarce, particularly in the context of Vietnamese language processing. This paper presents a novel Vietnamese MDS framework leveraging a two-component pipeline architecture that integrates extractive and abstractive techniques. The first component employs an extractive approach to identify key sentences within each document. This is achieved by a modification of the pre-trained BERT network, which derives semantically meaningful phrase embeddings using siamese and triplet network structures. The second component utilizes the VBD-LLaMA2-7B-50b model for abstractive summarization, ultimately generating the final summary document. Our proposed framework demonstrates a positive performance, attaining ROUGE-2 scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art baselines.
Related papers
- Absformer: Transformer-based Model for Unsupervised Multi-Document
Abstractive Summarization [1.066048003460524]
Multi-document summarization (MDS) refers to the task of summarizing the text in multiple documents into a concise summary.
Abstractive MDS aims to generate a coherent and fluent summary for multiple documents using natural language generation techniques.
We propose Absformer, a new Transformer-based method for unsupervised abstractive summary generation.
arXiv Detail & Related papers (2023-06-07T21:18:23Z) - LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and
generative models for Vietnamese multi-document summarization [1.4716144941085147]
This paper proposes a method for multi-document summarization based on cluster similarity.
After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models.
arXiv Detail & Related papers (2023-04-11T13:15:24Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Semantics-Consistent Cross-domain Summarization via Optimal Transport
Alignment [80.18786847090522]
We propose a Semantics-Consistent Cross-domain Summarization model based on optimal transport alignment with visual and textual segmentation.
We evaluated our method on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries.
arXiv Detail & Related papers (2022-10-10T14:27:10Z) - An analysis of document graph construction methods for AMR summarization [2.055054374525828]
We present a novel dataset consisting of human-annotated alignments between the nodes of paired documents and summaries.
We apply these two forms of evaluation to prior work as well as a new method for node merging and show that our new method has significantly better performance than prior work.
arXiv Detail & Related papers (2021-11-27T22:12:50Z) - SgSum: Transforming Multi-document Summarization into Sub-graph
Selection [27.40759123902261]
Most existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary.
We propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem.
Our model can produce significantly more coherent and informative summaries compared with traditional MDS methods.
arXiv Detail & Related papers (2021-10-25T05:12:10Z) - Eider: Evidence-enhanced Document-level Relation Extraction [56.71004595444816]
Document-level relation extraction (DocRE) aims at extracting semantic relations among entity pairs in a document.
We propose a three-stage evidence-enhanced DocRE framework consisting of joint relation and evidence extraction, evidence-centered relation extraction (RE), and fusion of extraction results.
arXiv Detail & Related papers (2021-06-16T09:43:16Z) - BASS: Boosting Abstractive Summarization with Unified Semantic Graph [49.48925904426591]
BASS is a framework for Boosting Abstractive Summarization based on a unified Semantic graph.
A graph-based encoder-decoder model is proposed to improve both the document representation and summary generation process.
Empirical results show that the proposed architecture brings substantial improvements for both long-document and multi-document summarization tasks.
arXiv Detail & Related papers (2021-05-25T16:20:48Z) - Topic-Guided Abstractive Text Summarization: a Joint Learning Approach [19.623946402970933]
We introduce a new approach for abstractive text summarization, Topic-Guided Abstractive Summarization.
The idea is to incorporate neural topic modeling with a Transformer-based sequence-to-sequence (seq2seq) model in a joint learning framework.
arXiv Detail & Related papers (2020-10-20T14:45:25Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.