Making Science Simple: Corpora for the Lay Summarisation of Scientific
Literature
- URL: http://arxiv.org/abs/2210.09932v2
- Date: Tue, 12 Dec 2023 07:39:55 GMT
- Title: Making Science Simple: Corpora for the Lay Summarisation of Scientific
Literature
- Authors: Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton
- Abstract summary: We present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale)
We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets.
- Score: 21.440724685950443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lay summarisation aims to jointly summarise and simplify a given text, thus
making its content more comprehensible to non-experts. Automatic approaches for
lay summarisation can provide significant value in broadening access to
scientific literature, enabling a greater degree of both interdisciplinary
knowledge sharing and public understanding when it comes to research findings.
However, current corpora for this task are limited in their size and scope,
hindering the development of broadly applicable data-driven approaches. Aiming
to rectify these issues, we present two novel lay summarisation datasets, PLOS
(large-scale) and eLife (medium-scale), each of which contains biomedical
journal articles alongside expert-written lay summaries. We provide a thorough
characterisation of our lay summaries, highlighting differing levels of
readability and abstractiveness between datasets that can be leveraged to
support the needs of different applications. Finally, we benchmark our datasets
using mainstream summarisation approaches and perform a manual evaluation with
domain experts, demonstrating their utility and casting light on the key
challenges of this task.
Related papers
- Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Label-Free Topic-Focused Summarization Using Query Augmentation [2.127049691404299]
This study introduces a novel method, Augmented-Query Summarization (AQS), for topic-focused summarization without the need for extensive labelled datasets.
Our method demonstrates the ability to generate relevant and accurate summaries, showing its potential as a cost-effective solution in data-rich environments.
This innovation paves the way for broader application and accessibility in the field of topic-focused summarization technology.
arXiv Detail & Related papers (2024-04-25T08:39:10Z) - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain.
This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized.
It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z) - MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation
of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction.
Numerous limitations exist within existing public MSMO datasets.
We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - An Empirical Survey on Long Document Summarization: Datasets, Models and
Metrics [33.655334920298856]
We provide a comprehensive overview of the research on long document summarization.
We conduct an empirical analysis to broaden the perspective on current research progress.
arXiv Detail & Related papers (2022-07-03T02:57:22Z) - Automatic Text Summarization Methods: A Comprehensive Review [1.6114012813668934]
This study provides a detailed analysis of text summarization concepts such as summarization approaches, techniques used, standard datasets, evaluation metrics and future scopes for research.
arXiv Detail & Related papers (2022-03-03T10:45:00Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Towards Personalized and Human-in-the-Loop Document Summarization [0.0]
This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques.
It covers (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries.
arXiv Detail & Related papers (2021-08-21T05:34:46Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.