NEWTS: A Corpus for News Topic-Focused Summarization
- URL: http://arxiv.org/abs/2205.15661v1
- Date: Tue, 31 May 2022 10:01:38 GMT
- Title: NEWTS: A Corpus for News Topic-Focused Summarization
- Authors: Seyed Ali Bahrainian, Sheridan Feucht, Carsten Eickhoff
- Abstract summary: This paper introduces the first topical summarization corpus, based on the well-known CNN/Dailymail dataset.
We evaluate a range of existing techniques and analyze the effectiveness of different prompting methods.
- Score: 9.872518517174498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text summarization models are approaching human levels of fidelity. Existing
benchmarking corpora provide concordant pairs of full and abridged versions of
Web, news or, professional content. To date, all summarization datasets operate
under a one-size-fits-all paradigm that may not reflect the full range of
organic summarization needs. Several recently proposed models (e.g., plug and
play language models) have the capacity to condition the generated summaries on
a desired range of themes. These capacities remain largely unused and
unevaluated as there is no dedicated dataset that would support the task of
topic-focused summarization.
This paper introduces the first topical summarization corpus NEWTS, based on
the well-known CNN/Dailymail dataset, and annotated via online crowd-sourcing.
Each source article is paired with two reference summaries, each focusing on a
different theme of the source document. We evaluate a representative range of
existing techniques and analyze the effectiveness of different prompting
methods.
Related papers
- Multi-Review Fusion-in-Context [20.681734117825822]
Grounded text generation requires both content selection and content consolidation.
Recent works have proposed a modular approach, with separate components for each step.
This study lays the groundwork for further exploration of modular text generation in the multi-document setting.
arXiv Detail & Related papers (2024-03-22T17:06:05Z) - Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects.
This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - Controlled Text Reduction [15.102190738450092]
We formalize textitControlled Text Reduction as a standalone task.
A model then needs to generate a coherent text that includes all and only the target information.
arXiv Detail & Related papers (2022-10-24T17:59:03Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - Topic-Aware Encoding for Extractive Summarization [15.113768658584979]
We propose a topic-aware encoding for document summarization to deal with this issue.
A neural topic model is added in the neural-based sentence-level representation learning to adequately consider the central topic information.
The experimental results on three public datasets show that our model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2021-12-17T15:26:37Z) - Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics.
We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Topic-Guided Abstractive Text Summarization: a Joint Learning Approach [19.623946402970933]
We introduce a new approach for abstractive text summarization, Topic-Guided Abstractive Summarization.
The idea is to incorporate neural topic modeling with a Transformer-based sequence-to-sequence (seq2seq) model in a joint learning framework.
arXiv Detail & Related papers (2020-10-20T14:45:25Z) - SupMMD: A Sentence Importance Model for Extractive Summarization using
Maximum Mean Discrepancy [92.5683788430012]
SupMMD is a novel technique for generic and update summarization based on the maximum discrepancy from kernel two-sample testing.
We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.
arXiv Detail & Related papers (2020-10-06T09:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.