SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section
- URL: http://arxiv.org/abs/2408.16444v1
- Date: Thu, 29 Aug 2024 11:13:23 GMT
- Title: SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section
- Authors: Leandro CarĂsio Fernandes, Gustavo Bartz Guedes, Thiago Soares Laitz, Thales Sales Almeida, Rodrigo Nogueira, Roberto Lotufo, Jayr Pereira,
- Abstract summary: This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey.
Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance.
- Score: 7.366861473623427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.
Related papers
- Generating a Structured Summary of Numerous Academic Papers: Dataset and
Method [20.90939310713561]
We propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic.
We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents.
To organize the diverse content from dozens of input documents, we propose a summarization method named category-based alignment and sparse transformer (CAST)
arXiv Detail & Related papers (2023-02-09T11:42:07Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Making Science Simple: Corpora for the Lay Summarisation of Scientific
Literature [21.440724685950443]
We present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale)
We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets.
arXiv Detail & Related papers (2022-10-18T15:28:30Z) - Survey of Query-based Text Summarization [31.907523097592513]
query-based text summarization is an important real world problem that requires to condense the prolix text data into a summary under the guidance of the query information.
This survey aims at summarizing some interesting work in query-based text summarization methods as well as related generic text summarization methods.
arXiv Detail & Related papers (2022-09-17T05:34:32Z) - An Empirical Survey on Long Document Summarization: Datasets, Models and
Metrics [33.655334920298856]
We provide a comprehensive overview of the research on long document summarization.
We conduct an empirical analysis to broaden the perspective on current research progress.
arXiv Detail & Related papers (2022-07-03T02:57:22Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Bringing Structure into Summaries: a Faceted Summarization Dataset for
Long Scientific Documents [30.09742243490895]
FacetSum is a faceted summarization benchmark built on Emerald journal articles.
Analyses and empirical results on our dataset reveal the importance of bringing structure into summaries.
We believe FacetSum will spur further advances in summarization research and foster the development of NLP systems.
arXiv Detail & Related papers (2021-05-31T22:58:38Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.