The Russian Drug Reaction Corpus and Neural Models for Drug Reactions
and Effectiveness Detection in User Reviews
- URL: http://arxiv.org/abs/2004.03659v1
- Date: Tue, 7 Apr 2020 19:26:13 GMT
- Title: The Russian Drug Reaction Corpus and Neural Models for Drug Reactions
and Effectiveness Detection in User Reviews
- Authors: Elena Tutubalina, Ilseyar Alimova, Zulfat Miftahutdinov, Andrey
Sakhovskiy, Valentin Malykh and Sergey Nikolenko
- Abstract summary: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products.
The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources.
The labelled part contains 500 consumer reviews about drug therapy with drug- and disease-related information.
- Score: 13.428173157465062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus
of consumer reviews in Russian about pharmaceutical products for the detection
of health-related named entities and the effectiveness of pharmaceutical
products. The corpus itself consists of two parts, the raw one and the labelled
one. The raw part includes 1.4 million health-related user-generated texts
collected from various Internet sources, including social media. The labelled
part contains 500 consumer reviews about drug therapy with drug- and
disease-related information. Labels for sentences include health-related issues
or their absence. The sentences with one are additionally labelled at the
expression level for identification of fine-grained subtypes such as drug
classes and drug forms, drug indications, and drug reactions. Further, we
present a baseline model for named entity recognition (NER) and multi-label
sentence classification tasks on this corpus. The macro F1 score of 74.85% in
the NER task was achieved by our RuDR-BERT model. For the sentence
classification task, our model achieves the macro F1 score of 68.82% gaining
7.47% over the score of BERT model trained on Russian data. We make the RuDReC
corpus and pretrained weights of domain-specific BERT models freely available
at https://github.com/cimm-kzn/RuDReC
Related papers
- RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus [2.4686585810894477]
This paper presents the first evaluated biomedical entity linking model for the Dutch language.
We derive a corpus from Wikipedia of ontology-linked Dutch biomedical entities in context.
Our results indicate that biomedical entity linking in a language other than English remains challenging.
arXiv Detail & Related papers (2024-05-20T10:30:36Z) - "Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques [2.2874754079405535]
This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral.
The collected data is manually labeled and verified manually to ensure that the labels are correct.
arXiv Detail & Related papers (2024-04-09T08:42:34Z) - Contextualized Medication Information Extraction Using Transformer-based
Deep Learning Architectures [35.65283211002216]
We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or not), and context classification.
We explored 6 state-of-the-art pretrained transformer models for the three subtasks, including GatorTron, a large language model pretrained using >90 billion words of text.
Our GatorTron models achieved the best F1-scores of 0.9828 for medication extraction (ranked 3rd), 0.9379 for event classification (ranked 2nd), and the best micro-average accuracy of 0.9126 for context classification.
arXiv Detail & Related papers (2023-03-14T22:22:28Z) - Multimodal Model with Text and Drug Embeddings for Adverse Drug Reaction
Classification [9.339007998235378]
We introduce a multimodal model with two components. These components are state-of-the-art BERT-based models for language understanding and molecular property prediction.
Experiments show that the molecular information obtained from neural networks is more beneficial for ADE classification than traditional molecular descriptors.
arXiv Detail & Related papers (2022-10-21T11:41:45Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - RuBioRoBERTa: a pre-trained biomedical language model for Russian
language biomedical text mining [117.56261821197741]
We present several BERT-based models for Russian language biomedical text mining.
The models are pre-trained on a corpus of freely available texts in the Russian biomedical domain.
arXiv Detail & Related papers (2022-04-08T09:18:59Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - An analysis of full-size Russian complexly NER labelled corpus of
Internet user reviews on the drugs based on deep learning and language neural
nets [94.37521840642141]
We present the full-size Russian complexly NER-labeled corpus of Internet user reviews.
A set of advanced deep learning neural networks is used to extract pharmacologically meaningful entities from Russian texts.
arXiv Detail & Related papers (2021-04-30T19:46:24Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.