MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
- URL: http://arxiv.org/abs/2409.14074v1
- Date: Sat, 21 Sep 2024 09:05:48 GMT
- Title: MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
- Authors: Khai Le-Duc, Phuc Phan, Tan-Hanh Pham, Bach Phan Tat, Minh-Huong Ngo, Truong-Son Hy,
- Abstract summary: We introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain.
We present the first reproducible study of multilinguality in medical ASR, conduct a layer-wise ablation study for end-to-end ASR training, and provide the first linguistic analysis for multilingual medical ASR.
- Score: 1.220481237642298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese, together with the corresponding real-world ASR dataset. To our best knowledge, MultiMed stands as the largest and the first multilingual medical ASR dataset, in terms of total duration, number of speakers, diversity of diseases, recording conditions, speaker roles, unique medical terms, accents, and ICD-10 codes. Secondly, we establish the empirical baselines, present the first reproducible study of multilinguality in medical ASR, conduct a layer-wise ablation study for end-to-end ASR training, and provide the first linguistic analysis for multilingual medical ASR. All code, data, and models are available online https://github.com/leduckhai/MultiMed/tree/master/MultiMed
Related papers
- Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks [13.016940516468674]
We aim to enhance the capability of medical MLLMs in understanding anatomical regions within entire medical scans.
We propose a Region-Aware medical MLLM, MedRegA, which is the first bilingual generalist medical AI system.
Our model can not only accomplish powerful performance across various medical vision-language tasks in bilingual settings, but also recognize and detect structures in multimodal medical scans.
arXiv Detail & Related papers (2024-10-24T02:55:41Z) - Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain [19.58987478434808]
We present Medical mT5, the first open-source text-to-text multilingual model for the medical domain.
A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks.
arXiv Detail & Related papers (2024-04-11T10:01:32Z) - Towards Building Multilingual Language Model for Medicine [54.1382395897071]
We construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages.
We propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench.
Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks.
arXiv Detail & Related papers (2024-02-21T17:47:20Z) - Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in
the HYKIST Project [0.0]
Language difficulties between natives and immigrants present a common issue on a daily basis, especially in medical domain.
The goal of the HYKIST Project is to develop a speech translation system to support patient-doctor communication with ASR and MT.
We describe our efforts to construct ASR systems for a conversational telephone speech recognition task in the medical domain for Vietnamese language.
arXiv Detail & Related papers (2023-09-26T21:12:09Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - Simple Yet Effective Neural Ranking and Reranking Baselines for
Cross-Lingual Information Retrieval [50.882816288076725]
Cross-lingual information retrieval is the task of searching documents in one language with queries in another.
We provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold.
We implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese.
arXiv Detail & Related papers (2023-04-03T14:17:00Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.