The Sound of Healthcare: Improving Medical Transcription ASR Accuracy
with Large Language Models
- URL: http://arxiv.org/abs/2402.07658v1
- Date: Mon, 12 Feb 2024 14:01:12 GMT
- Title: The Sound of Healthcare: Improving Medical Transcription ASR Accuracy
with Large Language Models
- Authors: Ayo Adedeji, Sarita Joshi, Brendan Doohan
- Abstract summary: Large Language Models (LLMs) can enhance the accuracy of Automatic Speech Recognition (ASR) systems in medical transcription.
Our research focuses on improvements in Word Error Rate (WER), Medical Concept WER (MC-WER) for the accurate transcription of essential medical terms, and speaker diarization accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving landscape of medical documentation, transcribing
clinical dialogues accurately is increasingly paramount. This study explores
the potential of Large Language Models (LLMs) to enhance the accuracy of
Automatic Speech Recognition (ASR) systems in medical transcription. Utilizing
the PriMock57 dataset, which encompasses a diverse range of primary care
consultations, we apply advanced LLMs to refine ASR-generated transcripts. Our
research is multifaceted, focusing on improvements in general Word Error Rate
(WER), Medical Concept WER (MC-WER) for the accurate transcription of essential
medical terms, and speaker diarization accuracy. Additionally, we assess the
role of LLM post-processing in improving semantic textual similarity, thereby
preserving the contextual integrity of clinical dialogues. Through a series of
experiments, we compare the efficacy of zero-shot and Chain-of-Thought (CoT)
prompting techniques in enhancing diarization and correction accuracy. Our
findings demonstrate that LLMs, particularly through CoT prompting, not only
improve the diarization accuracy of existing ASR systems but also achieve
state-of-the-art performance in this domain. This improvement extends to more
accurately capturing medical concepts and enhancing the overall semantic
coherence of the transcribed dialogues. These findings illustrate the dual role
of LLMs in augmenting ASR outputs and independently excelling in transcription
tasks, holding significant promise for transforming medical ASR systems and
leading to more accurate and reliable patient records in healthcare settings.
Related papers
- Searching for Best Practices in Medical Transcription with Large Language Model [1.0855602842179624]
This paper introduces a novel approach leveraging a Large Language Model (LLM) to generate highly accurate medical transcripts.
Our methodology integrates advanced language modeling techniques to lower the Word Error Rate (WER) and ensure the precise recognition of critical medical terms.
arXiv Detail & Related papers (2024-10-04T03:41:16Z) - LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation [15.520180125182756]
Recent advancements in integrating speech information into large language models (LLMs) have significantly improved automatic speech recognition (ASR) accuracy.
Existing methods often constrained by the capabilities of the speech encoders under varied acoustic conditions, such as accents.
We propose LA-RAG, a novel Retrieval-Augmented Generation (RAG) paradigm for LLM-based ASR.
arXiv Detail & Related papers (2024-09-13T07:28:47Z) - MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues [41.23757609484281]
Speech recognition errors can significantly degrade the performance of downstream tasks like summarization.
We propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models.
LLMs can effectively model ASR noise, and incorporating this noisy data into the training process significantly improves the robustness and accuracy of medical dialogue summarization systems.
arXiv Detail & Related papers (2024-08-26T17:04:00Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Clinical Dialogue Transcription Error Correction using Seq2Seq Models [1.663938381339885]
We present a seq2seq learning approach for ASR transcription error correction of clinical dialogues.
We fine-tune a seq2seq model on a mask-filling task using a domain-specific dataset which we have shared publicly for future research.
arXiv Detail & Related papers (2022-05-26T18:27:17Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Robust Prediction of Punctuation and Truecasing for Medical ASR [18.08508027663331]
This paper proposes a conditional joint modeling framework for prediction of punctuation and truecasing.
We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data.
arXiv Detail & Related papers (2020-07-04T07:15:13Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.