Exploring Content Selection in Summarization of Novel Chapters
- URL: http://arxiv.org/abs/2005.01840v3
- Date: Tue, 30 Mar 2021 01:05:09 GMT
- Title: Exploring Content Selection in Summarization of Novel Chapters
- Authors: Faisal Ladhak and Bryan Li and Yaser Al-Onaizan and Kathleen McKeown
- Abstract summary: We present a new summarization task, generating summaries of novel chapters using summary/chapter pairs from online study guides.
This is a harder task than the news summarization task, given the chapter length as well as the extreme paraphrasing and generalization found in the summaries.
We focus on extractive summarization, which requires the creation of a gold-standard set of extractive summaries.
- Score: 19.11830806780343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new summarization task, generating summaries of novel chapters
using summary/chapter pairs from online study guides. This is a harder task
than the news summarization task, given the chapter length as well as the
extreme paraphrasing and generalization found in the summaries. We focus on
extractive summarization, which requires the creation of a gold-standard set of
extractive summaries. We present a new metric for aligning reference summary
sentences with chapter sentences to create gold extracts and also experiment
with different alignment methods. Our experiments demonstrate significant
improvement over prior alignment approaches for our task as shown through
automatic metrics and a crowd-sourced pyramid analysis. We make our data
collection scripts available at
https://github.com/manestay/novel-chapter-dataset .
Related papers
- Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - A Modular Approach for Multimodal Summarization of TV Shows [55.20132267309382]
We present a modular approach where separate components perform specialized sub-tasks.
Our modules involve detecting scene boundaries, reordering scenes so as to minimize the number of cuts between different events, converting visual information to text, summarizing the dialogue in each scene, and fusing the scene summaries into a final summary for the entire episode.
We also present a new metric, PRISMA, to measure both precision and recall of generated summaries, which we decompose into atomic facts.
arXiv Detail & Related papers (2024-03-06T16:10:01Z) - Novel Chapter Abstractive Summarization using Spinal Tree Aware
Sub-Sentential Content Selection [29.30939223344407]
We present a pipelined extractive-abstractive approach to summarizing novel chapters.
We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset.
arXiv Detail & Related papers (2022-11-09T14:12:09Z) - Summarization with Graphical Elements [55.5913491389047]
We propose a new task: summarization with graphical elements.
We collect a high quality human labeled dataset to support research into the task.
arXiv Detail & Related papers (2022-04-15T17:16:41Z) - MemSum: Extractive Summarization of Long Documents using Multi-step
Episodic Markov Decision Processes [6.585259903186036]
We introduce MemSum, a reinforcement-learning-based extractive summarizer enriched at any given time step with information on the current extraction history.
Our innovation is in considering a broader information set when summarizing that would intuitively also be used by humans in this task.
arXiv Detail & Related papers (2021-07-19T14:41:31Z) - On Generating Extended Summaries of Long Documents [16.149617108647707]
We present a new method for generating extended summaries of long papers.
Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model.
Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences.
arXiv Detail & Related papers (2020-12-28T08:10:28Z) - Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models.
We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays.
Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z) - Learning to Summarize Passages: Mining Passage-Summary Pairs from
Wikipedia Revision Histories [110.54963847339775]
We propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories.
In particular, the method mines the main body passages and the introduction sentences which are added to the pages simultaneously.
The constructed dataset contains more than one hundred thousand passage-summary pairs.
arXiv Detail & Related papers (2020-04-06T12:11:50Z) - The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries [72.48439126769627]
We introduce the Shmoop Corpus: a dataset of 231 stories paired with detailed multi-paragraph summaries for each individual chapter.
From the corpus, we construct a set of common NLP tasks, including Cloze-form question answering and a simplified form of abstractive summarization.
We believe that the unique structure of this corpus provides an important foothold towards making machine story comprehension more approachable.
arXiv Detail & Related papers (2019-12-30T21:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.