Impact of translation on biomedical information extraction from
real-life clinical notes
- URL: http://arxiv.org/abs/2306.02042v1
- Date: Sat, 3 Jun 2023 07:48:00 GMT
- Title: Impact of translation on biomedical information extraction from
real-life clinical notes
- Authors: Christel G\'erardin, Yuhan Xiong, Perceval Wajsb\"urt, Fabrice Carrat,
Xavier Tannier
- Abstract summary: We compare two methods: a method involving French language models and a method involving English language models.
We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms.
Despite the recent improvement of the translation models, there is a significant performance difference between the two approaches in favor of the native French method.
- Score: 0.7227232362460347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The objective of our study is to determine whether using English tools to
extract and normalize French medical concepts on translations provides
comparable performance to French models trained on a set of annotated French
clinical notes. We compare two methods: a method involving French language
models and a method involving English language models. For the native French
method, the Named Entity Recognition (NER) and normalization steps are
performed separately. For the translated English method, after the first
translation step, we compare a two-step method and a terminology-oriented
method that performs extraction and normalization at the same time. We used
French, English and bilingual annotated datasets to evaluate all steps (NER,
normalization and translation) of our algorithms. Concerning the results, the
native French method performs better than the translated English one with a
global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38
[0.36;0.40] for the two English methods tested. In conclusion, despite the
recent improvement of the translation models, there is a significant
performance difference between the two approaches in favor of the native French
method which is more efficient on French medical texts, even with few annotated
documents.
Related papers
- HYBRINFOX at CheckThat! 2024 -- Task 2: Enriching BERT Models with the Expert System VAGO for Subjectivity Detection [0.8083061106940517]
The HYBRINFOX method ranked 1st with a macro F1 score of 0.7442 on the evaluation data.
We explain the principles of our hybrid approach, and outline ways in which the method could be improved for other languages besides English.
arXiv Detail & Related papers (2024-07-04T09:29:19Z) - Multilingual Clinical NER: Translation or Cross-lingual Transfer? [4.4924444466378555]
We show that translation-based methods can achieve similar performance to cross-lingual transfer.
We release MedNERF a medical NER test set extracted from French drug prescriptions and annotated with the same guidelines as an English dataset.
arXiv Detail & Related papers (2023-06-07T12:31:07Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Meta-Learning a Cross-lingual Manifold for Semantic Parsing [75.26271012018861]
Localizing a semantic to support new languages requires effective cross-lingual generalization.
We introduce a first-order meta-learning algorithm to train a semantic annotated with maximal sample efficiency during cross-lingual transfer.
Results across six languages on ATIS demonstrate that our combination of steps yields accurate semantics sampling $le$10% of source training data in each new language.
arXiv Detail & Related papers (2022-09-26T10:42:17Z) - Automated Drug-Related Information Extraction from French Clinical
Documents: ReLyfe Approach [0.4588028371034407]
This paper proposes a new approach for extracting drug-related information from French clinical scanned documents.
It is a combination of a rule-based phase and a Deep Learning approach.
arXiv Detail & Related papers (2021-11-29T22:11:23Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive
Summarization [41.578594261746055]
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems.
We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors.
We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article.
arXiv Detail & Related papers (2020-10-07T00:28:05Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.