A Multilingual Neural Machine Translation Model for Biomedical Data
- URL: http://arxiv.org/abs/2008.02878v1
- Date: Thu, 6 Aug 2020 21:26:43 GMT
- Title: A Multilingual Neural Machine Translation Model for Biomedical Data
- Authors: Alexandre B\'erard, Zae Myung Kim, Vassilina Nikoulina, Eunjeong L.
Park, Matthias Gall\'e
- Abstract summary: We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain.
The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English.
It is trained with large amounts of generic and biomedical data, using domain tags.
- Score: 84.17747489525794
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We release a multilingual neural machine translation model, which can be used
to translate text in the biomedical domain. The model can translate from 5
languages (French, German, Italian, Korean and Spanish) into English. It is
trained with large amounts of generic and biomedical data, using domain tags.
Our benchmarks show that it performs near state-of-the-art both on news
(generic domain) and biomedical test sets, and that it outperforms the existing
publicly released models. We believe that this release will help the
large-scale multilingual analysis of the digital content of the COVID-19 crisis
and of its effects on society, economy, and healthcare policies.
We also release a test set of biomedical text for Korean-English. It consists
of 758 sentences from official guidelines and recent papers, all about
COVID-19.
Related papers
- Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain [19.58987478434808]
We present Medical mT5, the first open-source text-to-text multilingual model for the medical domain.
A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks.
arXiv Detail & Related papers (2024-04-11T10:01:32Z) - A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages [17.40961028505384]
This work presents a multilingual corpus of texts concerning Adverse Drug Reactions gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese.
It contributes to the development of real-world multilingual language models for healthcare.
arXiv Detail & Related papers (2024-03-27T08:21:01Z) - KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained
Language Model [37.69464822182714]
Most biomedical pretrained language models are monolingual and cannot handle the growing cross-lingual requirements.
We propose a model called KBioXLM, which transforms the multilingual pretrained model XLM-R into the biomedical domain using a knowledge-anchored approach.
arXiv Detail & Related papers (2023-11-20T07:02:35Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - BioGPT: Generative Pre-trained Transformer for Biomedical Text
Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature.
We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z) - Enriching Biomedical Knowledge for Low-resource Language Through
Translation [1.6347851388527643]
We make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains.
Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained-Decoder Transformer model trained on 20 million abstracts from the high-quality public corpus.
arXiv Detail & Related papers (2022-10-11T16:35:10Z) - RuBioRoBERTa: a pre-trained biomedical language model for Russian
language biomedical text mining [117.56261821197741]
We present several BERT-based models for Russian language biomedical text mining.
The models are pre-trained on a corpus of freely available texts in the Russian biomedical domain.
arXiv Detail & Related papers (2022-04-08T09:18:59Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Conceptualized Representation Learning for Chinese Biomedical Text
Mining [14.77516568767045]
We investigate how the recently introduced pre-trained language model BERT can be adapted for Chinese biomedical corpora.
For the Chinese biomedical text, it is more difficult due to its complex structure and the variety of phrase combinations.
arXiv Detail & Related papers (2020-08-25T04:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.