TSTR: Too Short to Represent, Summarize with Details! Intro-Guided
Extended Summary Generation
- URL: http://arxiv.org/abs/2206.00847v1
- Date: Thu, 2 Jun 2022 02:45:31 GMT
- Title: TSTR: Too Short to Represent, Summarize with Details! Intro-Guided
Extended Summary Generation
- Authors: Sajad Sotudeh, Nazli Goharian
- Abstract summary: In domains where the source text is relatively long-form, such as in scientific documents, such summary is not able to go beyond the general and coarse overview.
In this paper, we propose TSTR, an extractive summarizer that utilizes the introductory information of documents as pointers to their salient information.
- Score: 22.738731393540633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many scientific papers such as those in arXiv and PubMed data collections
have abstracts with varying lengths of 50-1000 words and average length of
approximately 200 words, where longer abstracts typically convey more
information about the source paper. Up to recently, scientific summarization
research has typically focused on generating short, abstract-like summaries
following the existing datasets used for scientific summarization. In domains
where the source text is relatively long-form, such as in scientific documents,
such summary is not able to go beyond the general and coarse overview and
provide salient information from the source document. The recent interest to
tackle this problem motivated curation of scientific datasets, arXiv-Long and
PubMed-Long, containing human-written summaries of 400-600 words, hence,
providing a venue for research in generating long/extended summaries. Extended
summaries facilitate a faster read while providing details beyond coarse
information. In this paper, we propose TSTR, an extractive summarizer that
utilizes the introductory information of documents as pointers to their salient
information. The evaluations on two existing large-scale extended summarization
datasets indicate statistically significant improvement in terms of Rouge and
average Rouge (F1) scores (except in one case) as compared to strong baselines
and state-of-the-art. Comprehensive human evaluations favor our generated
extended summaries in terms of cohesion and completeness.
Related papers
- On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - GoSum: Extractive Summarization of Long Documents by Reinforcement
Learning and Graph Organized discourse state [6.4805900740861]
We propose GoSum, a reinforcement-learning-based extractive model for long-paper summarization.
GoSum encodes states by building a heterogeneous graph from different discourse levels for each input document.
We evaluate the model on two datasets of scientific articles summarization: PubMed and arXiv.
arXiv Detail & Related papers (2022-11-18T14:07:29Z) - Automatic Text Summarization Methods: A Comprehensive Review [1.6114012813668934]
This study provides a detailed analysis of text summarization concepts such as summarization approaches, techniques used, standard datasets, evaluation metrics and future scopes for research.
arXiv Detail & Related papers (2022-03-03T10:45:00Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Bringing Structure into Summaries: a Faceted Summarization Dataset for
Long Scientific Documents [30.09742243490895]
FacetSum is a faceted summarization benchmark built on Emerald journal articles.
Analyses and empirical results on our dataset reveal the importance of bringing structure into summaries.
We believe FacetSum will spur further advances in summarization research and foster the development of NLP systems.
arXiv Detail & Related papers (2021-05-31T22:58:38Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - On Generating Extended Summaries of Long Documents [16.149617108647707]
We present a new method for generating extended summaries of long papers.
Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model.
Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences.
arXiv Detail & Related papers (2020-12-28T08:10:28Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models.
We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays.
Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.