MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework
- URL: http://arxiv.org/abs/2509.00934v2
- Date: Fri, 19 Sep 2025 02:16:48 GMT
- Title: MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework
- Authors: Md Shahidul Salim, Lian Fu, Arav Adikesh Ramakrishnan, Zonghai Yao, Hong Yu,
- Abstract summary: MedCOD is a hybrid framework designed to improve English-to-Spanish medical translation by integrating domain-specific structured knowledge into large language models (LLMs)<n>We constructed a parallel corpus of 2,999 English-Spanish MedlinePlus articles and a 100-sentence test set annotated with structured medical contexts.<n> Experimental results demonstrate that MedCOD significantly improves translation quality across all models.
- Score: 8.604097439756378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MedCOD (Medical Chain-of-Dictionary), a hybrid framework designed to improve English-to-Spanish medical translation by integrating domain-specific structured knowledge into large language models (LLMs). MedCOD integrates domain-specific knowledge from both the Unified Medical Language System (UMLS) and the LLM-as-Knowledge-Base (LLM-KB) paradigm to enhance structured prompting and fine-tuning. We constructed a parallel corpus of 2,999 English-Spanish MedlinePlus articles and a 100-sentence test set annotated with structured medical contexts. Four open-source LLMs (Phi-4, Qwen2.5-14B, Qwen2.5-7B, and LLaMA-3.1-8B) were evaluated using structured prompts that incorporated multilingual variants, medical synonyms, and UMLS-derived definitions, combined with LoRA-based fine-tuning. Experimental results demonstrate that MedCOD significantly improves translation quality across all models. For example, Phi-4 with MedCOD and fine-tuning achieved BLEU 44.23, chrF++ 28.91, and COMET 0.863, surpassing strong baseline models like GPT-4o and GPT-4o-mini. Ablation studies confirm that both MedCOD prompting and model adaptation independently contribute to performance gains, with their combination yielding the highest improvements. These findings highlight the potential of structured knowledge integration to enhance LLMs for medical translation tasks.
Related papers
- MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images [25.29568841502814]
We introduce MedMO, a medical foundation model built upon a generalized MLLM architecture.<n>On VQA benchmarks, MedMO achieves an average accuracy improvement of +13.7% over the baseline.<n>In medical report generation, MedMO delivers significant gains in both semantic and clinical accuracy.
arXiv Detail & Related papers (2026-02-06T18:59:59Z) - A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Toward Global Large Language Models in Medicine [67.38063166560406]
GlobMed is a large multilingual medical dataset containing over 500,000 entries spanning 12 languages, including four low-resource languages.<n>GlobMed-Bench assesses 56 state-of-the-art proprietary and open-weight LLMs across multiple multilingual medical tasks, revealing significant performance disparities across languages.<n>GlobMed-LLMs achieved an average performance improvement of over 40% relative to baseline models, with a more than threefold increase in performance on low-resource languages.
arXiv Detail & Related papers (2026-01-05T15:05:49Z) - Multilingual LLM Prompting Strategies for Medical English-Vietnamese Machine Translation [7.238888652441979]
Medical English-Vietnamese machine translation (En-Vi MT) is essential for healthcare access and communication in Vietnam.<n>We evaluate prompting strategies for six multilingual LLMs (0.5B-9B parameters) on the MedEV dataset.
arXiv Detail & Related papers (2025-09-19T06:06:36Z) - BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text [10.071956824618418]
Large language models (LLMs) hold great promise for medical applications and are evolving rapidly.<n>Most existing benchmarks rely on medical exam-style questions or PubMed-derived text.<n>We present BRIDGE, a comprehensive benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages.
arXiv Detail & Related papers (2025-04-28T04:13:18Z) - Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation [0.0]
This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized for medical texts.
Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts.
Our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-16T19:32:23Z) - Towards Building Multilingual Language Model for Medicine [54.1382395897071]
We construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages.
We propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench.
Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks.
arXiv Detail & Related papers (2024-02-21T17:47:20Z) - Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering [48.17095875619711]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT)<n>LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules.<n>We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z) - Large Language Models Leverage External Knowledge to Extend Clinical
Insight Beyond Language Boundaries [48.48630043740588]
Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks.
We develop a novel in-context learning framework to enhance their performance.
arXiv Detail & Related papers (2023-05-17T12:31:26Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.