Bringing Structure into Summaries: a Faceted Summarization Dataset for
Long Scientific Documents
- URL: http://arxiv.org/abs/2106.00130v1
- Date: Mon, 31 May 2021 22:58:38 GMT
- Title: Bringing Structure into Summaries: a Faceted Summarization Dataset for
Long Scientific Documents
- Authors: Rui Meng, Khushboo Thaker, Lei Zhang, Yue Dong, Xingdi Yuan, Tong
Wang, Daqing He
- Abstract summary: FacetSum is a faceted summarization benchmark built on Emerald journal articles.
Analyses and empirical results on our dataset reveal the importance of bringing structure into summaries.
We believe FacetSum will spur further advances in summarization research and foster the development of NLP systems.
- Score: 30.09742243490895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Faceted summarization provides briefings of a document from different
perspectives. Readers can quickly comprehend the main points of a long document
with the help of a structured outline. However, little research has been
conducted on this subject, partially due to the lack of large-scale faceted
summarization datasets. In this study, we present FacetSum, a faceted
summarization benchmark built on Emerald journal articles, covering a diverse
range of domains. Different from traditional document-summary pairs, FacetSum
provides multiple summaries, each targeted at specific sections of a long
document, including the purpose, method, findings, and value. Analyses and
empirical results on our dataset reveal the importance of bringing structure
into summaries. We believe FacetSum will spur further advances in summarization
research and foster the development of NLP systems that can leverage the
structured information in both long texts and summaries.
Related papers
- SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section [7.366861473623427]
This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey.
Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance.
arXiv Detail & Related papers (2024-08-29T11:13:23Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - Generating a Structured Summary of Numerous Academic Papers: Dataset and
Method [20.90939310713561]
We propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic.
We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents.
To organize the diverse content from dozens of input documents, we propose a summarization method named category-based alignment and sparse transformer (CAST)
arXiv Detail & Related papers (2023-02-09T11:42:07Z) - An Empirical Survey on Long Document Summarization: Datasets, Models and
Metrics [33.655334920298856]
We provide a comprehensive overview of the research on long document summarization.
We conduct an empirical analysis to broaden the perspective on current research progress.
arXiv Detail & Related papers (2022-07-03T02:57:22Z) - TSTR: Too Short to Represent, Summarize with Details! Intro-Guided
Extended Summary Generation [22.738731393540633]
In domains where the source text is relatively long-form, such as in scientific documents, such summary is not able to go beyond the general and coarse overview.
In this paper, we propose TSTR, an extractive summarizer that utilizes the introductory information of documents as pointers to their salient information.
arXiv Detail & Related papers (2022-06-02T02:45:31Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - On Generating Extended Summaries of Long Documents [16.149617108647707]
We present a new method for generating extended summaries of long papers.
Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model.
Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences.
arXiv Detail & Related papers (2020-12-28T08:10:28Z) - Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised
Approach [89.56158561087209]
We study summarizing on arbitrary aspects relevant to the document.
Due to the lack of supervision data, we develop a new weak supervision construction method and an aspect modeling scheme.
Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents.
arXiv Detail & Related papers (2020-10-14T03:20:46Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models.
We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays.
Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z) - The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries [72.48439126769627]
We introduce the Shmoop Corpus: a dataset of 231 stories paired with detailed multi-paragraph summaries for each individual chapter.
From the corpus, we construct a set of common NLP tasks, including Cloze-form question answering and a simplified form of abstractive summarization.
We believe that the unique structure of this corpus provides an important foothold towards making machine story comprehension more approachable.
arXiv Detail & Related papers (2019-12-30T21:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.