Related papers: Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

URL: http://arxiv.org/abs/2109.12174v1
Date: Fri, 24 Sep 2021 20:18:59 GMT
Title: Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations
Authors: Longxiang Zhang, Renato Negrinho, Arindam Ghosh, Vasudevan Jagannathan, Hamid Reza Hassanzadeh, Thomas Schaaf, Matthew R. Gormley
Abstract summary: We show that fluent and adequate summaries can be generated with limited training data by fine-tuning BART. Using a carefully chosen fine-tuning dataset, this method is shown to be effective at handling longer conversations.
Score: 9.184616102949228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning pretrained models for automatically summarizing doctor-patient conversation transcripts presents many challenges: limited training data, significant domain shift, long and noisy transcripts, and high target summary variability. In this paper, we explore the feasibility of using pretrained transformer models for automatically summarizing doctor-patient conversations directly from transcripts. We show that fluent and adequate summaries can be generated with limited training data by fine-tuning BART on a specially constructed dataset. The resulting models greatly surpass the performance of an average human annotator and the quality of previous published work for the task. We evaluate multiple methods for handling long conversations, comparing them to the obvious baseline of truncating the conversation to fit the pretrained model length limit. We introduce a multistage approach that tackles the task by learning two fine-tuned models: one for summarizing conversation chunks into partial summaries, followed by one for rewriting the collection of partial summaries into a complete summary. Using a carefully chosen fine-tuning dataset, this method is shown to be effective at handling longer conversations, improving the quality of generated summaries. We conduct both an automatic evaluation (through ROUGE and two concept-based metrics focusing on medical findings) and a human evaluation (through qualitative examples from literature, assessing hallucination, generalization, fluency, and general quality of the generated summaries).

Related papers

Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction [1.3563640142303988]
Large language models (LLMs) can process lengthy documents even without supervised training on a task-specific dataset. One feasible approach for tasks with lengthy, complex input is to first summarize the document and then apply supervised fine-tuning to the summary. We present a method for processing the summaries of long documents aimed to capture different important aspects of the original document.
arXiv Detail & Related papers (2025-02-14T18:59:28Z)
Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization [0.05852077003870416]
This work leverages transformer-based BART model for human-like summarization. On training and fine-tuning the encoder-decoder model, it is tested with diverse sample articles. The finetuned model performance is compared with the baseline pretrained model. Empirical results on BBC News articles highlight that the gold standard summaries written by humans are more factually consistent by 17%.
arXiv Detail & Related papers (2024-10-22T09:25:04Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Generating Query Focused Summaries without Fine-tuning the Transformer-based Pre-trained Models [0.6124773188525718]
Fine-tuning the Natural Language Processing (NLP) models for each new data set requires higher computational time associated with increased carbon footprint and cost. In this paper, we try to omit the fine-tuning steps and investigate whether the Marginal Maximum Relevance (MMR)-based approach can help the pre-trained models to obtain query-focused summaries directly from a new data set that was not used to pre-train the models. As indicated by the experimental results, our MMR-based approach successfully ranked and selected the most relevant sentences as summaries and showed better performance than the individual pre-trained models.
arXiv Detail & Related papers (2023-03-10T22:40:15Z)
Heuristic-based Inter-training to Improve Few-shot Multi-perspective Dialog Summarization [13.117715760754077]
We study the multi-perspective summarization of customer-care conversations between support agents and customers. We show that our approach supports models to generate multi-perspective summaries with a very small amount of annotated data.
arXiv Detail & Related papers (2022-03-29T14:02:40Z)
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z)
SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion. We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics. We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z)
SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization. We convert the original documents to a sentence graph, taking both linguistic and deep representation into account. We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)
On Faithfulness and Factuality in Abstractive Summarization [17.261247316769484]
We analyzed limitations of neural text generation models for abstractive document summarization. We found that these models are highly prone to hallucinate content that is unfaithful to the input document. We show that textual entailment measures better correlate with faithfulness than standard metrics.
arXiv Detail & Related papers (2020-05-02T00:09:16Z)
Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof. At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario. We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers. Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z)
Learning by Semantic Similarity Makes Abstractive Summarization Better [13.324006587838522]
We compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM. Interestingly, model-generated summaries receive higher scores relative to reference summaries.
arXiv Detail & Related papers (2020-02-18T17:59:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.