SportsSum2.0: Generating High-Quality Sports News from Live Text
Commentary
- URL: http://arxiv.org/abs/2110.05750v1
- Date: Tue, 12 Oct 2021 05:39:48 GMT
- Title: SportsSum2.0: Generating High-Quality Sports News from Live Text
Commentary
- Authors: Jiaan Wang, Zhixu Li, Qiang Yang, Jianfeng Qu, Zhigang Chen, Qingsheng
Liu, Guoping Hu
- Abstract summary: Sports game summarization aims to generate news articles from live text commentaries.
Recent work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework.
In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework.
- Score: 18.52461327269355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sports game summarization aims to generate news articles from live text
commentaries. A recent state-of-the-art work, SportsSum, not only constructs a
large benchmark dataset, but also proposes a two-step framework. Despite its
great contributions, the work has three main drawbacks: 1) the noise existed in
SportsSum dataset degrades the summarization performance; 2) the neglect of
lexical overlap between news and commentaries results in low-quality
pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten
sentences to form news limits its practicability. In this paper, we publish a
new benchmark dataset SportsSum2.0, together with a modified summarization
framework. In particular, to obtain a clean dataset, we employ crowd workers to
manually clean the original dataset. Moreover, the degree of lexical overlap is
incorporated into the generation of pseudo labels. Further, we introduce a
reranker-enhanced summarizer to take into account the fluency and
expressiveness of the summarized news. Extensive experiments show that our
model outperforms the state-of-the-art baseline.
Related papers
- Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels.
The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation.
Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z) - Template-based Abstractive Microblog Opinion Summarisation [26.777997436856076]
We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries.
The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset.
arXiv Detail & Related papers (2022-08-08T12:16:01Z) - NEWTS: A Corpus for News Topic-Focused Summarization [9.872518517174498]
This paper introduces the first topical summarization corpus, based on the well-known CNN/Dailymail dataset.
We evaluate a range of existing techniques and analyze the effectiveness of different prompting methods.
arXiv Detail & Related papers (2022-05-31T10:01:38Z) - Knowledge Enhanced Sports Game Summarization [14.389241106925438]
We introduce K-SportsSum, a new dataset with two characteristics.
K-SportsSum collects a large amount of data from massive games.
K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players.
arXiv Detail & Related papers (2021-11-24T15:06:20Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - Generating Representative Headlines for News Stories [31.67864779497127]
Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption.
It remains a challenging research problem to efficiently and effectively generate a representative headline for each story.
We develop a distant supervision approach to train large-scale generation models without any human annotation.
arXiv Detail & Related papers (2020-01-26T02:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.