Deeper Clinical Document Understanding Using Relation Extraction
- URL: http://arxiv.org/abs/2112.13259v1
- Date: Sat, 25 Dec 2021 17:14:13 GMT
- Title: Deeper Clinical Document Understanding Using Relation Extraction
- Authors: Hasham Ul Haq, Veysel Kocaman, David Talby
- Abstract summary: We propose a text mining framework comprising of Named Entity Recognition (NER) and Relation Extraction (RE) models.
We introduce two new RE model architectures -- an accuracy-optimized one based on BioBERT and a speed-optimized one utilizing crafted features over a Fully Connected Neural Network (FCNN)
We show two practical applications of this framework -- for building a biomedical knowledge graph and for improving the accuracy of mapping entities to clinical codes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The surging amount of biomedical literature & digital clinical records
presents a growing need for text mining techniques that can not only identify
but also semantically relate entities in unstructured data. In this paper we
propose a text mining framework comprising of Named Entity Recognition (NER)
and Relation Extraction (RE) models, which expands on previous work in three
main ways. First, we introduce two new RE model architectures -- an
accuracy-optimized one based on BioBERT and a speed-optimized one utilizing
crafted features over a Fully Connected Neural Network (FCNN). Second, we
evaluate both models on public benchmark datasets and obtain new
state-of-the-art F1 scores on the 2012 i2b2 Clinical Temporal Relations
challenge (F1 of 73.6, +1.2% over the previous SOTA), the 2010 i2b2 Clinical
Relations challenge (F1 of 69.1, +1.2%), the 2019 Phenotype-Gene Relations
dataset (F1 of 87.9, +8.5%), the 2012 Adverse Drug Events Drug-Reaction dataset
(F1 of 90.0, +6.3%), and the 2018 n2c2 Posology Relations dataset (F1 of 96.7,
+0.6%). Third, we show two practical applications of this framework -- for
building a biomedical knowledge graph and for improving the accuracy of mapping
entities to clinical codes. The system is built using the Spark NLP library
which provides a production-grade, natively scalable, hardware-optimized,
trainable & tunable NLP framework.
Related papers
- Towards Effective and Efficient Continual Pre-training of Large Language Models [163.34610964970258]
Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks.
This paper presents a technical report for continually pre-training Llama-3 (8B)
It significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model.
arXiv Detail & Related papers (2024-07-26T13:55:21Z) - Multi-objective Representation for Numbers in Clinical Narratives Using CamemBERT-bio [0.9208007322096533]
This research aims to classify numerical values extracted from medical documents across seven physiological categories.
We introduce two main innovations: integrating keyword embeddings into the model and adopting a number-agnostic strategy.
We show substantial improvements in the effectiveness of CamemBERT-bio, surpassing conventional methods with an F1 score of 0.89.
arXiv Detail & Related papers (2024-05-28T01:15:21Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts [0.0]
This paper proposes a method for zero- and few-shot NER in the biomedical domain.
We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities.
arXiv Detail & Related papers (2023-05-05T12:14:22Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - A Meta-GNN approach to personalized seizure detection and classification [53.906130332172324]
We propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples.
We train a Meta-GNN based classifier that learns a global model from a set of training patients.
We show that our method outperforms the baselines by reaching 82.7% on accuracy and 82.08% on F1 score after only 20 iterations on new unseen patients.
arXiv Detail & Related papers (2022-11-01T14:12:58Z) - Clinical Relation Extraction Using Transformer-based Models [28.237302721228435]
We developed a series of clinical RE models based on three transformer architectures, namely BERT, RoBERTa, and XLNet.
We demonstrated that the RoBERTa-clinical RE model achieved the best performance on the 2018 MADE1.0 dataset with an F1-score of 0.8958.
Our results indicated that the binary classification strategy consistently outperformed the multi-class classification strategy for clinical relation extraction.
arXiv Detail & Related papers (2021-07-19T15:15:51Z) - Neural Entity Recognition with Gazetteer based Fusion [7.024494879945238]
We propose an auxiliary gazetteer model and fuse it with an NER system, which results in better robustness and interpretability across different clinical datasets.
Our gazetteer based fusion model is data efficient, achieving +1.7 micro-F1 gains on the i2b2 dataset using 20% training data, and brings + 4.7 micro-F1 gains on novel entity mentions never presented during training.
arXiv Detail & Related papers (2021-05-27T15:14:15Z) - Improving Clinical Document Understanding on COVID-19 Research with
Spark NLP [0.0]
Following the global COVID-19 pandemic, the number of scientific papers studying the virus has grown massively.
We present a clinical text mining system that improves on previous efforts in three ways.
First, it can recognize over 100 different entity types including social determinants of health, anatomy, risk factors, and adverse events.
Second, the text processing pipeline includes assertion status detection, to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient.
arXiv Detail & Related papers (2020-12-07T19:17:05Z) - Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug
Response [49.86828302591469]
In this paper, we apply transfer learning to the prediction of anti-cancer drug response.
We apply the classic transfer learning framework that trains a prediction model on the source dataset and refines it on the target dataset.
The ensemble transfer learning pipeline is implemented using LightGBM and two deep neural network (DNN) models with different architectures.
arXiv Detail & Related papers (2020-05-13T20:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.