Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation
- URL: http://arxiv.org/abs/2204.00447v1
- Date: Fri, 1 Apr 2022 14:04:16 GMT
- Title: Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation
- Authors: Francesco Moramarco, Alex Papadopoulos Korfiatis, Mark Perera, Damir
Juric, Jack Flann, Ehud Reiter, Anya Belz, Aleksandar Savkov
- Abstract summary: In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
- Score: 56.25869366777579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, machine learning models have rapidly become better at
generating clinical consultation notes; yet, there is little work on how to
properly evaluate the generated consultation notes to understand the impact
they may have on both the clinician using them and the patient's clinical
safety. To address this we present an extensive human evaluation study of
consultation notes where 5 clinicians (i) listen to 57 mock consultations, (ii)
write their own notes, (iii) post-edit a number of automatically generated
notes, and (iv) extract all the errors, both quantitative and qualitative. We
then carry out a correlation study with 18 automatic quality metrics and the
human judgements. We find that a simple, character-based Levenshtein distance
metric performs on par if not better than common model-based metrics like
BertScore. All our findings and annotations are open-sourced.
Related papers
- Query-Guided Self-Supervised Summarization of Nursing Notes [5.835276312834499]
We introduce QGSumm, a query-guided self-supervised domain adaptation framework for nursing note summarization.
Our approach generates high-quality, patient-centered summaries without relying on reference summaries for training.
arXiv Detail & Related papers (2024-07-04T18:54:30Z) - Impact of Large Language Model Assistance on Patients Reading Clinical
Notes: A Mixed-Methods Study [47.61555826813361]
Complex medical concepts and jargon within clinical notes hinder patient comprehension and may lead to anxiety.
We developed a patient-facing tool to simplify, extract information from, and add context to notes.
Augmentations were evaluated for errors by clinicians, and we found misleading errors occur.
arXiv Detail & Related papers (2024-01-17T23:14:52Z) - An Investigation of Evaluation Metrics for Automated Medical Note
Generation [5.094623170336122]
We study evaluation methods and metrics for the automatic generation of clinical notes from medical conversations.
To study the correlation between the automatic metrics and manual judgments, we evaluate automatic notes/summaries by comparing the system and reference facts.
arXiv Detail & Related papers (2023-05-27T04:34:58Z) - Revisiting Automatic Question Summarization Evaluation in the Biomedical
Domain [45.78632945525459]
We conduct human evaluations of summarization quality from four different aspects of a biomedical question summarization task.
Based on human judgments, we identify different noteworthy features for current automatic metrics and summarization systems.
arXiv Detail & Related papers (2023-03-18T04:28:01Z) - Consultation Checklists: Standardising the Human Evaluation of Medical
Note Generation [58.54483567073125]
We propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists.
We observed good levels of inter-annotator agreement in a first evaluation study using the protocol.
arXiv Detail & Related papers (2022-11-17T10:54:28Z) - User-Driven Research of Medical Note Generation Software [49.85146209418244]
We present three rounds of user studies carried out in the context of developing a medical note generation system.
We discuss the participating clinicians' impressions and views of how the system ought to be adapted to be of value to them.
We describe a three-week test run of the system in a live telehealth clinical practice.
arXiv Detail & Related papers (2022-05-05T10:18:06Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - A preliminary study on evaluating Consultation Notes with Post-Editing [67.30200768442926]
We propose a semi-automatic approach whereby physicians post-edit generated notes before submitting them.
We conduct a preliminary study on the time saving of automatically generated consultation notes with post-editing.
We time this and find that it is faster than writing the note from scratch.
arXiv Detail & Related papers (2021-04-09T14:42:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.