Readability Controllable Biomedical Document Summarization
- URL: http://arxiv.org/abs/2210.04705v3
- Date: Mon, 1 May 2023 16:01:37 GMT
- Title: Readability Controllable Biomedical Document Summarization
- Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou
- Abstract summary: We introduce a new task of readability controllable summarization for biomedical documents.
It aims to recognise users' readability demands and generate summaries that better suit their needs.
- Score: 17.166794984161964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different from general documents, it is recognised that the ease with which
people can understand a biomedical text is eminently varied, owing to the
highly technical nature of biomedical documents and the variance of readers'
domain knowledge. However, existing biomedical document summarization systems
have paid little attention to readability control, leaving users with summaries
that are incompatible with their levels of expertise. In recognition of this
urgent demand, we introduce a new task of readability controllable
summarization for biomedical documents, which aims to recognise users'
readability demands and generate summaries that better suit their needs:
technical summaries for experts and plain language summaries (PLS) for laymen.
To establish this task, we construct a corpus consisting of biomedical papers
with technical summaries and PLSs written by the authors, and benchmark
multiple advanced controllable abstractive and extractive summarization models
based on pre-trained language models (PLMs) with prevalent controlling and
generation techniques. Moreover, we propose a novel masked language model (MLM)
based metric and its variant to effectively evaluate the readability
discrepancy between lay and technical summaries. Experimental results from
automated and human evaluations show that though current control techniques
allow for a certain degree of readability adjustment during generation, the
performance of existing controllable summarization methods is far from
desirable in this task.
Related papers
- NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering [0.14999444543328289]
We introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization.
We employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction.
arXiv Detail & Related papers (2024-10-29T14:45:12Z) - Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare.
This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models [1.9665865095034865]
We formulate the relation extraction task as binary classifications for large language models.
We designate the main title as the tail entity and explicitly incorporate it into the context.
Longer contents are sliced into text chunks, embedded, and retrieved with additional embedding models.
arXiv Detail & Related papers (2023-12-13T16:43:41Z) - Improving Biomedical Abstractive Summarisation with Knowledge
Aggregation from Citation Papers [24.481854035628434]
Existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts.
We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers.
Our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.
arXiv Detail & Related papers (2023-10-24T09:56:46Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Paragraph-level Simplification of Medical Texts [35.650619024498425]
Manual simplification does not scale to the rapidly growing body of biomedical literature.
We introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics.
We propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts.
arXiv Detail & Related papers (2021-04-12T18:56:05Z) - Automated Lay Language Summarization of Biomedical Scientific Reviews [16.01452242066412]
Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes.
Medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret.
This paper introduces the novel task of automated generation of lay language summaries of biomedical scientific reviews.
arXiv Detail & Related papers (2020-12-23T10:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.