MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
- URL: http://arxiv.org/abs/2409.14074v2
- Date: Thu, 09 Jan 2025 10:50:12 GMT
- Title: MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
- Authors: Khai Le-Duc, Phuc Phan, Tan-Hanh Pham, Bach Phan Tat, Minh-Huong Ngo, Truong-Son Hy,
- Abstract summary: We introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models.
To our best knowledge, MultiMed stands as the world's largest medical ASR dataset across all major benchmarks.
We present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Decoder (AED) vs Hybrid comparative study, a layer-wise ablation study for the AED, and a linguistic analysis for multilingual medical ASR.
- Score: 1.220481237642298
- License:
- Abstract: Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as the world's largest medical ASR dataset across all major benchmarks: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study, a layer-wise ablation study for the AED, and a linguistic analysis for multilingual medical ASR. All code, data, and models are available online: https://github.com/leduckhai/MultiMed/tree/master/MultiMed
Related papers
- BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities [36.30252230403839]
BiMediX2 is a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM)
It integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images.
BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks.
arXiv Detail & Related papers (2024-12-10T18:59:35Z) - Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks [13.016940516468674]
We aim to enhance the capability of medical MLLMs in understanding anatomical regions within entire medical scans.
We propose a Region-Aware medical MLLM, MedRegA, which is the first bilingual generalist medical AI system.
Our model can not only accomplish powerful performance across various medical vision-language tasks in bilingual settings, but also recognize and detect structures in multimodal medical scans.
arXiv Detail & Related papers (2024-10-24T02:55:41Z) - WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation [4.149844666297669]
Multimodal/vision language models (VLMs) are increasingly being deployed in healthcare settings worldwide.
Existing datasets are largely text-only and available in a limited subset of languages and countries.
WorldMedQA-V includes 568 labeled multiple-choice QAs paired with 568 medical images from four countries.
arXiv Detail & Related papers (2024-10-16T16:31:24Z) - Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain [19.58987478434808]
We present Medical mT5, the first open-source text-to-text multilingual model for the medical domain.
A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks.
arXiv Detail & Related papers (2024-04-11T10:01:32Z) - Towards Building Multilingual Language Model for Medicine [54.1382395897071]
We construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages.
We propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench.
Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks.
arXiv Detail & Related papers (2024-02-21T17:47:20Z) - BiMediX: Bilingual Medical Mixture of Experts LLM [90.3257333861513]
We introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic.
Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details.
We propose a semi-automated English-to-Arabic translation pipeline with human refinement to ensure high-quality translations.
arXiv Detail & Related papers (2024-02-20T18:59:26Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.