Paragraph-level Simplification of Medical Texts
- URL: http://arxiv.org/abs/2104.05767v1
- Date: Mon, 12 Apr 2021 18:56:05 GMT
- Title: Paragraph-level Simplification of Medical Texts
- Authors: Ashwin Devaraj, Iain J. Marshall, Byron C. Wallace, Junyi Jessy Li
- Abstract summary: Manual simplification does not scale to the rapidly growing body of biomedical literature.
We introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics.
We propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts.
- Score: 35.650619024498425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning to simplify medical texts. This is
important because most reliable, up-to-date information in biomedicine is dense
with jargon and thus practically inaccessible to the lay audience. Furthermore,
manual simplification does not scale to the rapidly growing body of biomedical
literature, motivating the need for automated approaches. Unfortunately, there
are no large-scale resources available for this task. In this work we introduce
a new corpus of parallel texts in English comprising technical and lay
summaries of all published evidence pertaining to different clinical topics. We
then propose a new metric based on likelihood scores from a masked language
model pretrained on scientific texts. We show that this automated measure
better differentiates between technical and lay summaries than existing
heuristics. We introduce and evaluate baseline encoder-decoder Transformer
models for simplification and propose a novel augmentation to these in which we
explicitly penalize the decoder for producing "jargon" terms; we find that this
yields improvements over baselines in terms of readability.
Related papers
- SciGisPy: a Novel Metric for Biomedical Text Simplification via Gist Inference Score [7.4751114996742]
We introduce SciGisPy, a novel evaluation metric inspired by Gist Inference Score (GIS) from Fuzzy-Trace Theory (FTT)
SciGisPy measures how well a simplified text facilitates the formation of abstract inferences (gist) necessary for comprehension.
Our experimental evaluation on the Cochrane biomedical text simplification dataset demonstrates that SciGisPy outperforms the original GIS formulation.
arXiv Detail & Related papers (2024-10-12T19:53:56Z) - Medical Text Simplification: Optimizing for Readability with
Unlikelihood Training and Reranked Beam Search Decoding [18.06012822620814]
Text simplification has emerged as an increasingly useful application of AI for bridging the communication gap in specialized fields such as medicine.
Despite notable progress, methods in medical simplification sometimes result in the generated text having lower quality and diversity.
We propose a new unlikelihood loss that encourages generation of simpler terms and a reranked beam search decoding method that optimize for simplicity.
arXiv Detail & Related papers (2023-10-17T12:14:03Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Multilingual Simplification of Medical Texts [49.469685530201716]
We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages.
We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses.
Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.
arXiv Detail & Related papers (2023-05-21T18:25:07Z) - NapSS: Paragraph-level Medical Text Simplification via Narrative
Prompting and Sentence-matching Summarization [46.772517928718216]
We propose a summarize-then-simplify two-stage strategy, which we call NapSS.
NapSS identifies the relevant content to simplify while ensuring that the original narrative flow is preserved.
Our model achieves significantly better than the seq2seq baseline on an English medical corpus.
arXiv Detail & Related papers (2023-02-11T02:20:25Z) - Readability Controllable Biomedical Document Summarization [17.166794984161964]
We introduce a new task of readability controllable summarization for biomedical documents.
It aims to recognise users' readability demands and generate summaries that better suit their needs.
arXiv Detail & Related papers (2022-10-10T14:03:20Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Automated Lay Language Summarization of Biomedical Scientific Reviews [16.01452242066412]
Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes.
Medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret.
This paper introduces the novel task of automated generation of lay language summaries of biomedical scientific reviews.
arXiv Detail & Related papers (2020-12-23T10:01:18Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.