Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study
- URL: http://arxiv.org/abs/2504.16601v1
- Date: Wed, 23 Apr 2025 10:31:33 GMT
- Title: Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study
- Authors: Andy Li, Wei Zhou, Rashina Hoda, Chris Bain, Peter Poon,
- Abstract summary: This study evaluates how well large language models (LLMs) and traditional machine translation (MT) tools translate medical consultation summaries from English into Arabic, Chinese, and Vietnamese.<n>Results showed that traditional MT tools generally performed better, especially for complex texts.<n>Arabic translations improved with complexity due to the language's morphology.
- Score: 9.136745540182423
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study evaluates how well large language models (LLMs) and traditional machine translation (MT) tools translate medical consultation summaries from English into Arabic, Chinese, and Vietnamese. It assesses both patient, friendly and clinician, focused texts using standard automated metrics. Results showed that traditional MT tools generally performed better, especially for complex texts, while LLMs showed promise, particularly in Vietnamese and Chinese, when translating simpler summaries. Arabic translations improved with complexity due to the language's morphology. Overall, while LLMs offer contextual flexibility, they remain inconsistent, and current evaluation metrics fail to capture clinical relevance. The study highlights the need for domain-specific training, improved evaluation methods, and human oversight in medical translation.
Related papers
- MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation [3.6818524036584686]
MultiMed-ST is a large-scale ST dataset for the medical domain spanning all translation directions in five languages.<n>With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset.<n>We present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, code-switch analysis, and quantitative-qualitative error analysis.
arXiv Detail & Related papers (2025-04-04T15:49:17Z) - On Creating an English-Thai Code-switched Machine Translation in Medical Domain [2.0737832185611524]
Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge.
Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies.
Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English.
arXiv Detail & Related papers (2024-10-21T17:25:32Z) - LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation [0.0]
This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized for medical texts.
Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts.
Our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-16T19:32:23Z) - Google Translate Error Analysis for Mental Healthcare Information:
Evaluating Accuracy, Comprehensibility, and Implications for Multilingual
Healthcare Communication [8.178490288773013]
This study explores the use of Google Translate for translating mental healthcare (MHealth) information from English to Persian, Arabic, Turkish, Romanian, and Spanish.
Native speakers of the target languages manually assessed the GT translations, focusing on medical terminology accuracy, comprehensibility, and critical syntactic/semantic errors.
GT output analysis revealed challenges in accurately translating medical terminology, particularly in Arabic, Romanian, and Persian.
arXiv Detail & Related papers (2024-02-06T14:16:32Z) - Cross-lingual neural fuzzy matching for exploiting target-language
monolingual corpora in computer-aided translation [0.0]
In this paper, we introduce a novel neural approach aimed at exploiting in-domain target-language (TL) monolingual corpora.
Our approach relies on cross-lingual sentence embeddings to retrieve translation proposals from TL monolingual corpora, and on a neural model to estimate their post-editing effort.
The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment.
arXiv Detail & Related papers (2024-01-16T14:00:28Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - CMB: A Comprehensive Medical Benchmark in Chinese [67.69800156990952]
We propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese.
While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety.
We have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain.
arXiv Detail & Related papers (2023-08-17T07:51:23Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Benchmarking Machine Translation with Cultural Awareness [50.183458829028226]
Translating culture-related content is vital for effective cross-cultural communication.
Many culture-specific items (CSIs) often lack viable translations across languages.
This difficulty hinders the analysis of cultural awareness of machine translation systems.
arXiv Detail & Related papers (2023-05-23T17:56:33Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.