Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for
Vietnamese Abstractive Multi-document Summarization
- URL: http://arxiv.org/abs/2311.15525v1
- Date: Mon, 27 Nov 2023 04:01:13 GMT
- Title: Overview of the VLSP 2022 -- Abmusu Shared Task: A Data Challenge for
Vietnamese Abstractive Multi-document Summarization
- Authors: Mai-Vu Tran, Hoang-Quynh Le, Duy-Cat Can, Quoc-An Nguyen
- Abstract summary: The goal of Abmusu shared task is to develop summarization systems that could create abstractive summaries automatically for a set of documents on a topic.
We build a human-annotated dataset of 1,839 documents in 600 clusters, collected from Vietnamese news in 8 categories.
Models are evaluated and ranked in terms of textttROUGE2-F1 score, the most typical evaluation metric for document summarization problem.
- Score: 0.6827423171182151
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper reports the overview of the VLSP 2022 - Vietnamese abstractive
multi-document summarization (Abmusu) shared task for Vietnamese News. This
task is hosted at the 9$^{th}$ annual workshop on Vietnamese Language and
Speech Processing (VLSP 2022). The goal of Abmusu shared task is to develop
summarization systems that could create abstractive summaries automatically for
a set of documents on a topic. The model input is multiple news documents on
the same topic, and the corresponding output is a related abstractive summary.
In the scope of Abmusu shared task, we only focus on Vietnamese news
summarization and build a human-annotated dataset of 1,839 documents in 600
clusters, collected from Vietnamese news in 8 categories. Participated models
are evaluated and ranked in terms of \texttt{ROUGE2-F1} score, the most typical
evaluation metric for document summarization problem.
Related papers
- GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human [71.42669028683741]
We present a shared task on binary machine generated text detection conducted as a part of the GenAI workshop at COLING 2025.
The task consists of two subtasks: Monolingual (English) and Multilingual.
We provide a comprehensive overview of the data, a summary of the results, detailed descriptions of the participating systems, and an in-depth analysis of submissions.
arXiv Detail & Related papers (2025-01-19T11:11:55Z) - Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - Overview of the VLSP 2023 -- ComOM Shared Task: A Data Challenge for
Comparative Opinion Mining from Vietnamese Product Reviews [0.6827423171182151]
This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM)
The primary objective of this shared task is to advance the field of natural language processing by developing techniques that proficiently extract comparative opinions from Vietnamese product reviews.
We construct a human-annotated dataset comprising $120$ documents, encompassing $7427$ non-comparative sentences and $2468$ comparisons within $1798$ sentences.
arXiv Detail & Related papers (2024-02-21T08:29:26Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - DeVAn: Dense Video Annotation for Video-Language Models [68.70692422636313]
We present a novel human annotated dataset for evaluating the ability for visual-language models to generate descriptions for real-world video clips.
The dataset contains 8.5K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests.
arXiv Detail & Related papers (2023-10-08T08:02:43Z) - Vietnamese multi-document summary using subgraph selection approach --
VLSP 2022 AbMuSu Shared Task [0.0]
Document summarization is a task to generate afluent, condensed summary for a document.
In this paper, we focus on transforming the extractive MDS problem into subgraph selection.
Experiments have been implemented on the Vietnamese dataset published in VLSP Evaluation Campaign 2022.
arXiv Detail & Related papers (2023-06-26T16:34:02Z) - LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and
generative models for Vietnamese multi-document summarization [1.4716144941085147]
This paper proposes a method for multi-document summarization based on cluster similarity.
After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models.
arXiv Detail & Related papers (2023-04-11T13:15:24Z) - VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task.
The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video.
The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - CREATIVESUMM: Shared Task on Automatic Summarization for Creative
Writing [90.58269243992318]
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts.
We introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts.
As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total.
arXiv Detail & Related papers (2022-11-10T21:31:03Z) - VLSP 2021 Shared Task: Vietnamese Machine Reading Comprehension [2.348805691644086]
This article presents details of the organization of the shared task, an overview of the methods employed by shared-task participants, and the results.
We provide a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language.
The UIT-ViQuAD 2.0 dataset motivates more researchers to explore Vietnamese machine reading comprehension, question answering, and question generation.
arXiv Detail & Related papers (2022-03-22T00:44:41Z) - Liputan6: A Large-scale Indonesian Dataset for Text Summarization [43.375797352517765]
We harvest articles from Liputan6.com, an online news portal, and obtain 215,827 document-summary pairs.
We leverage pre-trained language models to develop benchmark extractive and abstractive summarization methods over the dataset.
arXiv Detail & Related papers (2020-11-02T02:01:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.