VCSUM: A Versatile Chinese Meeting Summarization Dataset
- URL: http://arxiv.org/abs/2305.05280v2
- Date: Mon, 15 May 2023 09:30:39 GMT
- Title: VCSUM: A Versatile Chinese Meeting Summarization Dataset
- Authors: Han Wu, Mingjie Zhan, Haochen Tan, Zhaohui Hou, Ding Liang, and Linqi
Song
- Abstract summary: We introduce a versatile Chinese meeting summarization dataset, dubbed VCSum, consisting of 239 real-life meetings.
We provide the annotations of topic segmentation, headlines, segmentation summaries, overall meeting summaries, and salient sentences for each meeting transcript.
Our analysis confirms the effectiveness and robustness of VCSum.
- Score: 25.695308276427166
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Compared to news and chat summarization, the development of meeting
summarization is hugely decelerated by the limited data. To this end, we
introduce a versatile Chinese meeting summarization dataset, dubbed VCSum,
consisting of 239 real-life meetings, with a total duration of over 230 hours.
We claim our dataset is versatile because we provide the annotations of topic
segmentation, headlines, segmentation summaries, overall meeting summaries, and
salient sentences for each meeting transcript. As such, the dataset can adapt
to various summarization tasks or methods, including segmentation-based
summarization, multi-granularity summarization and retrieval-then-generate
summarization. Our analysis confirms the effectiveness and robustness of VCSum.
We also provide a set of benchmark models regarding different downstream
summarization tasks on VCSum to facilitate further research. The dataset and
code will be released at https://github.com/hahahawu/VCSum.
Related papers
- Investigating Consistency in Query-Based Meeting Summarization: A
Comparative Study of Different Embedding Methods [0.0]
Text Summarization is one of famous applications in Natural Language Processing (NLP) field.
It aims to automatically generate summary with important information based on a given context.
In this paper, we are inspired by "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization" proposed by Microsoft.
We also propose our Locater model designed to extract relevant spans based on given transcript and query, which are then summarized by Summarizer model.
arXiv Detail & Related papers (2024-02-10T08:25:30Z) - Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with
Weak Supervision on Sentence Classification [91.13086984529706]
Aspect-based meeting transcript summarization aims to produce multiple summaries.
Traditional summarization methods produce one summary mixing information of all aspects.
We propose a two-stage method for aspect-based meeting transcript summarization.
arXiv Detail & Related papers (2023-11-07T19:06:31Z) - MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation
of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction.
Numerous limitations exist within existing public MSMO datasets.
We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - Hierarchical3D Adapters for Long Video-to-text Summarization [79.01926022762093]
multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
arXiv Detail & Related papers (2022-10-10T16:44:36Z) - AgreeSum: Agreement-Oriented Multi-Document Summarization [3.4743618614284113]
Given a cluster of articles, the goal is to provide abstractive summaries that represent information common and faithful to all input articles.
We create a dataset for AgreeSum, and provide annotations on articlesummary entailment relations for a subset of the clusters in the dataset.
arXiv Detail & Related papers (2021-06-04T06:17:49Z) - CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese
Timeline Summarization [22.813746290856916]
We introduce the CNTLS dataset, a versatile resource for Chinese timeline summarization.
CNTLS encompasses 77 real-life topics, each with 2524 documents and summarizes nearly 60% days duration compression.
We evaluate the performance of various extractive and generative summarization systems on the CNTLS corpus.
arXiv Detail & Related papers (2021-05-29T03:47:10Z) - QMSum: A New Benchmark for Query-based Multi-domain Meeting
Summarization [45.83402681068943]
QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains.
We investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task.
arXiv Detail & Related papers (2021-04-13T05:00:35Z) - A Hierarchical Network for Abstractive Meeting Summarization with
Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario.
We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.
Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.