Vietnamese multi-document summary using subgraph selection approach --
VLSP 2022 AbMuSu Shared Task
- URL: http://arxiv.org/abs/2306.14827v1
- Date: Mon, 26 Jun 2023 16:34:02 GMT
- Title: Vietnamese multi-document summary using subgraph selection approach --
VLSP 2022 AbMuSu Shared Task
- Authors: Huu-Thin Nguyen, Tam Doan Thanh, Cam-Van Thi Nguyen
- Abstract summary: Document summarization is a task to generate afluent, condensed summary for a document.
In this paper, we focus on transforming the extractive MDS problem into subgraph selection.
Experiments have been implemented on the Vietnamese dataset published in VLSP Evaluation Campaign 2022.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Document summarization is a task to generate afluent, condensed summary for a
document, andkeep important information. A cluster of documents serves as the
input for multi-document summarizing (MDS), while the cluster summary serves as
the output. In this paper, we focus on transforming the extractive MDS problem
into subgraph selection. Approaching the problem in the form of graphs helps to
capture simultaneously the relationship between sentences in the same document
and between sentences in the same cluster based on exploiting the overall graph
structure and selected subgraphs. Experiments have been implemented on the
Vietnamese dataset published in VLSP Evaluation Campaign 2022. This model
currently results in the top 10 participating teams reported on the ROUGH-2
$F\_1$ measure on the public test set.
Related papers
- Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for
Vietnamese Abstractive Multi-document Summarization [0.6827423171182151]
The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic.
We build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories.
Models are evaluated and ranked in terms of textttROUGE2-F1 score, the most typical evaluation metric for document summarization problem.
arXiv Detail & Related papers (2023-11-27T04:01:13Z) - Abstractive Summarization of Large Document Collections Using GPT [1.8130068086063336]
This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents.
Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis.
arXiv Detail & Related papers (2023-10-09T13:06:21Z) - LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and
generative models for Vietnamese multi-document summarization [1.4716144941085147]
This paper proposes a method for multi-document summarization based on cluster similarity.
After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models.
arXiv Detail & Related papers (2023-04-11T13:15:24Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - Question-Based Salient Span Selection for More Controllable Text
Summarization [67.68208237480646]
We propose a method for incorporating question-answering (QA) signals into a summarization model.
Our method identifies salient noun phrases (NPs) in the input document by automatically generating wh-questions that are answered by the NPs.
This QA-based signal is incorporated into a two-stage summarization model which first marks salient NPs in the input document using a classification model, then conditionally generates a summary.
arXiv Detail & Related papers (2021-11-15T17:36:41Z) - SgSum: Transforming Multi-document Summarization into Sub-graph
Selection [27.40759123902261]
Most existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary.
We propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem.
Our model can produce significantly more coherent and informative summaries compared with traditional MDS methods.
arXiv Detail & Related papers (2021-10-25T05:12:10Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.