Leveraging Pretrained Models for Automatic Summarization of
Doctor-Patient Conversations
- URL: http://arxiv.org/abs/2109.12174v1
- Date: Fri, 24 Sep 2021 20:18:59 GMT
- Title: Leveraging Pretrained Models for Automatic Summarization of
Doctor-Patient Conversations
- Authors: Longxiang Zhang, Renato Negrinho, Arindam Ghosh, Vasudevan
Jagannathan, Hamid Reza Hassanzadeh, Thomas Schaaf, Matthew R. Gormley
- Abstract summary: We show that fluent and adequate summaries can be generated with limited training data by fine-tuning BART.
Using a carefully chosen fine-tuning dataset, this method is shown to be effective at handling longer conversations.
- Score: 9.184616102949228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning pretrained models for automatically summarizing doctor-patient
conversation transcripts presents many challenges: limited training data,
significant domain shift, long and noisy transcripts, and high target summary
variability. In this paper, we explore the feasibility of using pretrained
transformer models for automatically summarizing doctor-patient conversations
directly from transcripts. We show that fluent and adequate summaries can be
generated with limited training data by fine-tuning BART on a specially
constructed dataset. The resulting models greatly surpass the performance of an
average human annotator and the quality of previous published work for the
task. We evaluate multiple methods for handling long conversations, comparing
them to the obvious baseline of truncating the conversation to fit the
pretrained model length limit. We introduce a multistage approach that tackles
the task by learning two fine-tuned models: one for summarizing conversation
chunks into partial summaries, followed by one for rewriting the collection of
partial summaries into a complete summary. Using a carefully chosen fine-tuning
dataset, this method is shown to be effective at handling longer conversations,
improving the quality of generated summaries. We conduct both an automatic
evaluation (through ROUGE and two concept-based metrics focusing on medical
findings) and a human evaluation (through qualitative examples from literature,
assessing hallucination, generalization, fluency, and general quality of the
generated summaries).
Related papers
- Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization [0.05852077003870416]
This work leverages transformer-based BART model for human-like summarization.
On training and fine-tuning the encoder-decoder model, it is tested with diverse sample articles.
The finetuned model performance is compared with the baseline pretrained model.
Empirical results on BBC News articles highlight that the gold standard summaries written by humans are more factually consistent by 17%.
arXiv Detail & Related papers (2024-10-22T09:25:04Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Generating Query Focused Summaries without Fine-tuning the
Transformer-based Pre-trained Models [0.6124773188525718]
Fine-tuning the Natural Language Processing (NLP) models for each new data set requires higher computational time associated with increased carbon footprint and cost.
In this paper, we try to omit the fine-tuning steps and investigate whether the Marginal Maximum Relevance (MMR)-based approach can help the pre-trained models to obtain query-focused summaries directly from a new data set that was not used to pre-train the models.
As indicated by the experimental results, our MMR-based approach successfully ranked and selected the most relevant sentences as summaries and showed better performance than the individual pre-trained models.
arXiv Detail & Related papers (2023-03-10T22:40:15Z) - Heuristic-based Inter-training to Improve Few-shot Multi-perspective
Dialog Summarization [13.117715760754077]
We study the multi-perspective summarization of customer-care conversations between support agents and customers.
We show that our approach supports models to generate multi-perspective summaries with a very small amount of annotated data.
arXiv Detail & Related papers (2022-03-29T14:02:40Z) - Improving Zero and Few-Shot Abstractive Summarization with Intermediate
Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains.
We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - On Faithfulness and Factuality in Abstractive Summarization [17.261247316769484]
We analyzed limitations of neural text generation models for abstractive document summarization.
We found that these models are highly prone to hallucinate content that is unfaithful to the input document.
We show that textual entailment measures better correlate with faithfulness than standard metrics.
arXiv Detail & Related papers (2020-05-02T00:09:16Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z) - A Hierarchical Network for Abstractive Meeting Summarization with
Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario.
We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.
Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z) - Learning by Semantic Similarity Makes Abstractive Summarization Better [13.324006587838522]
We compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM.
Interestingly, model-generated summaries receive higher scores relative to reference summaries.
arXiv Detail & Related papers (2020-02-18T17:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.