Performant ASR Models for Medical Entities in Accented Speech
- URL: http://arxiv.org/abs/2406.12387v1
- Date: Tue, 18 Jun 2024 08:19:48 GMT
- Title: Performant ASR Models for Medical Entities in Accented Speech
- Authors: Tejumade Afonja, Tobi Olatunji, Sewade Ogun, Naome A. Etori, Abraham Owodunni, Moshood Yekini,
- Abstract summary: We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents.
Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety.
- Score: 0.9346027495459037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical applicability in healthcare environments.
Related papers
- VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records [2.8078482678056527]
VeriFact is an artificial intelligence system for fact-checking large language models (LLM) in clinical medicine.
It decomposes Brief Hospital Course narratives into simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes.
It achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinican ground truth.
arXiv Detail & Related papers (2025-01-28T03:13:16Z) - The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? [0.0]
This study investigates the prevalence and impact of ASR errors in medical transcription in Nigeria, the United Kingdom, and the United States.
We assess the potential and limitations of Large Language Models to address challenges related to accents and medical terminology in ASR.
arXiv Detail & Related papers (2025-01-25T19:40:26Z) - High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR [1.3810901729134184]
We introduce United-MedASR, a novel architecture that addresses challenges by integrating synthetic data generation, precision ASR fine-tuning, and semantic enhancement techniques.
United-MedASR constructs a specialised medical vocabulary by synthesising data from authoritative sources such as ICD-10, MIMS, and FDA databases.
To enhance processing speed, we incorporate Faster Whisper, ensuring streamlined and high-speed ASR performance.
arXiv Detail & Related papers (2024-11-24T17:02:48Z) - SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research [45.2233252981348]
Large Language Models have shown promising results in their ability to encode general medical knowledge.
We test the ability of state-of-the-art LLMs to leverage their internal knowledge and reasoning for epilepsy diagnosis.
arXiv Detail & Related papers (2024-07-03T11:02:12Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.