Factuality Detection using Machine Translation -- a Use Case for German
Clinical Text
- URL: http://arxiv.org/abs/2308.08827v1
- Date: Thu, 17 Aug 2023 07:24:06 GMT
- Title: Factuality Detection using Machine Translation -- a Use Case for German
Clinical Text
- Authors: Mohammed Bin Sumait, Aleksandra Gabryszak, Leonhard Hennig, Roland
Roller
- Abstract summary: This work presents a simple solution using machine translation to translate English data to German to train a transformer-based factuality detection model.
Factuality can play an important role when automatically processing clinical text, as it makes a difference if particular symptoms are explicitly not present, possibly present, not mentioned, or affirmed.
- Score: 45.875111164923545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Factuality can play an important role when automatically processing clinical
text, as it makes a difference if particular symptoms are explicitly not
present, possibly present, not mentioned, or affirmed. In most cases, a
sufficient number of examples is necessary to handle such phenomena in a
supervised machine learning setting. However, as clinical text might contain
sensitive information, data cannot be easily shared. In the context of
factuality detection, this work presents a simple solution using machine
translation to translate English data to German to train a transformer-based
factuality detection model.
Related papers
- Cross-lingual Argument Mining in the Medical Domain [6.0158981171030685]
We show how to perform Argument Mining (AM) in medical texts for which no annotated data is available.
Our work shows that automatically translating and projecting annotations (data-transfer) from English to a given target language is an effective way to generate annotated data.
We also show how the automatically generated data in Spanish can also be used to improve results in the original English monolingual setting.
arXiv Detail & Related papers (2023-01-25T11:21:12Z) - Prompting Large Language Model for Machine Translation: A Case Study [87.88120385000666]
We offer a systematic study on prompting strategies for machine translation.
We examine factors for prompt template and demonstration example selection.
We explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning.
arXiv Detail & Related papers (2023-01-17T18:32:06Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - A Medical Information Extraction Workbench to Process German Clinical
Text [5.519657218427976]
We introduce a workbench: a collection of German clinical text processing models.
The models are trained on a de-identified corpus of German nephrology reports.
Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.
arXiv Detail & Related papers (2022-07-08T13:19:19Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Learning to Detect Unacceptable Machine Translations for Downstream
Tasks [33.07594909221625]
We put machine translation in a cross-lingual pipeline and introduce downstream tasks to define task-specific acceptability of machine translations.
This allows us to leverage parallel data to automatically generate acceptability annotations on a large scale.
We conduct experiments to demonstrate the effectiveness of our framework for a range of downstream tasks and translation models.
arXiv Detail & Related papers (2020-05-08T09:37:19Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Data Mining in Clinical Trial Text: Transformers for Classification and
Question Answering Tasks [2.127049691404299]
This research applies advances in natural language processing to evidence synthesis based on medical texts.
The main focus is on information characterized via the Population, Intervention, Comparator, and Outcome (PICO) framework.
Recent neural network architectures based on transformers show capacities for transfer learning and increased performance on downstream natural language processing tasks.
arXiv Detail & Related papers (2020-01-30T11:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.