Cross-lingual Approaches for the Detection of Adverse Drug Reactions in
German from a Patient's Perspective
- URL: http://arxiv.org/abs/2208.02031v1
- Date: Wed, 3 Aug 2022 12:52:01 GMT
- Title: Cross-lingual Approaches for the Detection of Adverse Drug Reactions in
German from a Patient's Perspective
- Authors: Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian
M\"oller, Pierre Zweigenbaum
- Abstract summary: We present the first corpus for German Adverse Drug Reaction detection in patient-generated content.
The data consists of 4,169 binary annotated documents from a German patient forum.
- Score: 3.8233498951276403
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we present the first corpus for German Adverse Drug Reaction
(ADR) detection in patient-generated content. The data consists of 4,169 binary
annotated documents from a German patient forum, where users talk about health
issues and get advice from medical doctors. As is common in social media data
in this domain, the class labels of the corpus are very imbalanced. This and a
high topic imbalance make it a very challenging dataset, since often, the same
symptom can have several causes and is not always related to a medication
intake. We aim to encourage further multi-lingual efforts in the domain of ADR
detection and provide preliminary experiments for binary classification using
different methods of zero- and few-shot learning based on a multi-lingual
model. When fine-tuning XLM-RoBERTa first on English patient forum data and
then on the new German data, we achieve an F1-score of 37.52 for the positive
class. We make the dataset and models publicly available for the community.
Related papers
- Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation [3.328297368052458]
We tackle bias detection in medical curricula using NLP models, including LLMs.
We evaluate them on a gold standard dataset containing 4,105 excerpts annotated by medical experts for bias from a large corpus.
arXiv Detail & Related papers (2024-09-11T17:10:20Z) - A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages [17.40961028505384]
This work presents a multilingual corpus of texts concerning Adverse Drug Reactions gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese.
It contributes to the development of real-world multilingual language models for healthcare.
arXiv Detail & Related papers (2024-03-27T08:21:01Z) - Semantic Coherence Markers for the Early Diagnosis of the Alzheimer
Disease [0.0]
Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence.
We employed language models as diverse as N-grams, from 2-grams to 5-grams, and GPT-2, a transformer-based language model.
Best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class and control subjects.
arXiv Detail & Related papers (2023-02-02T11:40:16Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - GERNERMED -- An Open German Medical NER Model [0.7310043452300736]
Data mining in the field of medical data analysis often needs to rely solely on processing of unstructured data to retrieve relevant data.
In this work, we present GERNERMED, the first open, neural NLP model for NER tasks dedicated to detect medical entity types in German text data.
arXiv Detail & Related papers (2021-09-24T17:53:47Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.