CNewSum: A Large-scale Chinese News Summarization Dataset with
Human-annotated Adequacy and Deducibility Level
- URL: http://arxiv.org/abs/2110.10874v1
- Date: Thu, 21 Oct 2021 03:37:46 GMT
- Title: CNewSum: A Large-scale Chinese News Summarization Dataset with
Human-annotated Adequacy and Deducibility Level
- Authors: Danqing Wang, Jiaze Chen, Xianze Wu, Hao Zhou and Lei Li
- Abstract summary: We present a large-scale Chinese news summarization dataset CNewSum.
It consists of 304,307 documents and human-written summaries for the news feed.
Its test set contains adequacy and deducibility annotations for the summaries.
- Score: 15.969302324314516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic text summarization aims to produce a brief but crucial summary for
the input documents. Both extractive and abstractive methods have witnessed
great success in English datasets in recent years. However, there has been a
minimal exploration of text summarization in Chinese, limited by the lack of
large-scale datasets. In this paper, we present a large-scale Chinese news
summarization dataset CNewSum, which consists of 304,307 documents and
human-written summaries for the news feed. It has long documents with
high-abstractive summaries, which can encourage document-level understanding
and generation for current summarization models. An additional distinguishing
feature of CNewSum is that its test set contains adequacy and deducibility
annotations for the summaries. The adequacy level measures the degree of
summary information covered by the document, and the deducibility indicates the
reasoning ability the model needs to generate the summary. These annotations
can help researchers analyze and target their model performance bottleneck. We
examine recent methods on CNewSum and release our dataset to provide a solid
testbed for automatic Chinese summarization research.
Related papers
- Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - LMGQS: A Large-scale Dataset for Query-focused Summarization [77.6179359525065]
We convert four generic summarization benchmarks into a new QFS benchmark dataset, LMGQS.
We establish baselines with state-of-the-art summarization models.
We achieve state-of-the-art zero-shot and supervised performance on multiple existing QFS benchmarks.
arXiv Detail & Related papers (2023-05-22T14:53:45Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - Topic-Aware Encoding for Extractive Summarization [15.113768658584979]
We propose a topic-aware encoding for document summarization to deal with this issue.
A neural topic model is added in the neural-based sentence-level representation learning to adequately consider the central topic information.
The experimental results on three public datasets show that our model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2021-12-17T15:26:37Z) - Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics.
We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z) - Bengali Abstractive News Summarization(BANS): A Neural Attention
Approach [0.8793721044482612]
We present a seq2seq based Long Short-Term Memory (LSTM) network model with attention at encoder-decoder.
Our proposed system deploys a local attention-based model that produces a long sequence of words with lucid and human-like generated sentences.
We also prepared a dataset of more than 19k articles and corresponding human-written summaries collected from bangla.bdnews24.com1.
arXiv Detail & Related papers (2020-12-03T08:17:31Z) - Enhancing Extractive Text Summarization with Topic-Aware Graph Neural
Networks [21.379555672973975]
This paper proposes a graph neural network (GNN)-based extractive summarization model.
Our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection.
The experimental results demonstrate that our model achieves substantially state-of-the-art results on CNN/DM and NYT datasets.
arXiv Detail & Related papers (2020-10-13T09:30:04Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.