FaMeSumm: Investigating and Improving Faithfulness of Medical
Summarization
- URL: http://arxiv.org/abs/2311.02271v2
- Date: Wed, 8 Nov 2023 22:54:33 GMT
- Title: FaMeSumm: Investigating and Improving Faithfulness of Medical
Summarization
- Authors: Nan Zhang, Yusen Zhang, Wu Guo, Prasenjit Mitra, Rui Zhang
- Abstract summary: Current summarization models often produce unfaithful outputs for medical input text.
FaMeSumm is a framework to improve faithfulness by fine-tuning pre-trained language models based on medical knowledge.
- Score: 20.7585913214759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Summaries of medical text shall be faithful by being consistent and factual
with source inputs, which is an important but understudied topic for safety and
efficiency in healthcare. In this paper, we investigate and improve
faithfulness in summarization on a broad range of medical summarization tasks.
Our investigation reveals that current summarization models often produce
unfaithful outputs for medical input text. We then introduce FaMeSumm, a
framework to improve faithfulness by fine-tuning pre-trained language models
based on medical knowledge. FaMeSumm performs contrastive learning on designed
sets of faithful and unfaithful summaries, and it incorporates medical terms
and their contexts to encourage faithful generation of medical terms. We
conduct comprehensive experiments on three datasets in two languages: health
question and radiology report summarization datasets in English, and a
patient-doctor dialogue dataset in Chinese. Results demonstrate that FaMeSumm
is flexible and effective by delivering consistent improvements over mainstream
language models such as BART, T5, mT5, and PEGASUS, yielding state-of-the-art
performances on metrics for faithfulness and general quality. Human evaluation
by doctors also shows that FaMeSumm generates more faithful outputs. Our code
is available at https://github.com/psunlpgroup/FaMeSumm .
Related papers
- uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization [23.173826980480936]
Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness.
This paper presents a benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics.
We propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition.
arXiv Detail & Related papers (2024-08-22T03:08:49Z) - Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation.
Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios.
Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z) - MedInsight: A Multi-Source Context Augmentation Framework for Generating
Patient-Centric Medical Responses using Large Language Models [3.0874677990361246]
Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses.
We propose MedInsight:a novel retrieval framework that augments LLM inputs with relevant background information.
Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses.
arXiv Detail & Related papers (2024-03-13T15:20:30Z) - Med-HALT: Medical Domain Hallucination Test for Large Language Models [0.0]
This research paper focuses on the challenges posed by hallucinations in large language models (LLMs)
We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations.
arXiv Detail & Related papers (2023-07-28T06:43:04Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Cross-lingual Argument Mining in the Medical Domain [6.0158981171030685]
We show how to perform Argument Mining (AM) in medical texts for which no annotated data is available.
Our work shows that automatically translating and projecting annotations (data-transfer) from English to a given target language is an effective way to generate annotated data.
We also show how the automatically generated data in Spanish can also be used to improve results in the original English monolingual setting.
arXiv Detail & Related papers (2023-01-25T11:21:12Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Word-level Text Highlighting of Medical Texts forTelehealth Services [0.0]
This paper aims to show how different text highlighting techniques can capture relevant medical context.
Three different word-level text highlighting methodologies are implemented and evaluated.
The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms.
arXiv Detail & Related papers (2021-05-21T15:13:54Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.