Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes
- URL: http://arxiv.org/abs/2406.02826v1
- Date: Wed, 5 Jun 2024 00:11:20 GMT
- Title: Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes
- Authors: Yu-Wen Chen, Julia Hirschberg,
- Abstract summary: We investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data.
We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations.
- Score: 12.628782042255395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets, indicating that format mismatch (i.e., discrepancies in word distribution) is not the main cause of performance decline on out-of-domain data. Lastly, a detailed analysis of SOAP notes is included to provide insights into missing information and hallucinations introduced by the models.
Related papers
- Systematic Exploration of Dialogue Summarization Approaches for Reproducibility, Comparative Assessment, and Methodological Innovations for Advancing Natural Language Processing in Abstractive Summarization [0.0]
This paper delves into the reproduction and evaluation of dialogue summarization models.
Our research involved a thorough examination of several dialogue summarization models using the AMI dataset.
The primary objective was to evaluate the informativeness and quality of the summaries generated by these models through human assessment.
arXiv Detail & Related papers (2024-10-21T12:47:57Z) - Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies [8.822087602255504]
Applying large language models to the clinical domain is challenging due to the context-heavy nature of processing medical records.
This paper explores how different embedding models and pooling methods affect information retrieval for the clinical domain.
arXiv Detail & Related papers (2024-09-23T16:16:08Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Towards an Automated SOAP Note: Classifying Utterances from Medical
Conversations [0.6875312133832078]
We bridge the gap for classifying utterances from medical conversations according to (i) the SOAP section and (ii) the speaker role.
We present a systematic analysis in which we adapt an existing deep learning architecture to the two aforementioned tasks.
The results suggest that modelling context in a hierarchical manner, which captures both word and utterance level context, yields substantial improvements on both classification tasks.
arXiv Detail & Related papers (2020-07-17T04:19:30Z) - Attention-based Neural Bag-of-Features Learning for Sequence Data [143.62294358378128]
2D-Attention (2DA) is a generic attention formulation for sequence data.
The proposed attention module is incorporated into the recently proposed Neural Bag of Feature (NBoF) model to enhance its learning capacity.
Our empirical analysis shows that the proposed attention formulations can not only improve performances of NBoF models but also make them resilient to noisy data.
arXiv Detail & Related papers (2020-05-25T17:51:54Z) - Generating SOAP Notes from Doctor-Patient Conversations Using Modular
Summarization Techniques [43.13248746968624]
We introduce the first complete pipelines to leverage deep summarization models to generate SOAP notes.
We propose Cluster2Sent, an algorithm that extracts important utterances relevant to each summary section.
Our results speak to the benefits of structuring summaries into sections and annotating supporting evidence when constructing summarization corpora.
arXiv Detail & Related papers (2020-05-04T19:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.