Clinical Named Entity Recognition using Contextualized Token
Representations
- URL: http://arxiv.org/abs/2106.12608v1
- Date: Wed, 23 Jun 2021 18:12:58 GMT
- Title: Clinical Named Entity Recognition using Contextualized Token
Representations
- Authors: Yichao Zhou, Chelsea Ju, J. Harry Caufield, Kevin Shih, Calvin Chen,
Yizhou Sun, Kai-Wei Chang, Peipei Ping, Wei Wang
- Abstract summary: This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
- Score: 49.036805795072645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The clinical named entity recognition (CNER) task seeks to locate and
classify clinical terminologies into predefined categories, such as diagnostic
procedure, disease disorder, severity, medication, medication dosage, and sign
symptom. CNER facilitates the study of side-effect on medications including
identification of novel phenomena and human-focused information extraction.
Existing approaches in extracting the entities of interests focus on using
static word embeddings to represent each word. However, one word can have
different interpretations that depend on the context of the sentences.
Evidently, static word embeddings are insufficient to integrate the diverse
interpretation of a word. To overcome this challenge, the technique of
contextualized word embedding has been introduced to better capture the
semantic meaning of each word based on its context. Two of these language
models, ELMo and Flair, have been widely used in the field of Natural Language
Processing to generate the contextualized word embeddings on domain-generic
documents. However, these embeddings are usually too general to capture the
proximity among vocabularies of specific domains. To facilitate various
downstream applications using clinical case reports (CCRs), we pre-train two
deep contextualized language models, Clinical Embeddings from Language Model
(C-ELMo) and Clinical Contextual String Embeddings (C-Flair) using the
clinical-related corpus from the PubMed Central. Explicit experiments show that
our models gain dramatic improvements compared to both static word embeddings
and domain-generic language models.
Related papers
- Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques [0.0]
Multiple terms can refer to the same core concepts which can be referred as a clinical entity.
Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities.
We propose a suite of context-based and context-less remention techniques for performing the entity disambiguation.
arXiv Detail & Related papers (2024-05-24T01:14:33Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Applying unsupervised keyphrase methods on concepts extracted from
discharge sheets [7.102620843620572]
It is necessary to identify the section in which each content is recorded and also to identify key concepts to extract meaning from clinical texts.
In this study, these challenges have been addressed by using clinical natural language processing techniques.
A set of popular unsupervised key phrase extraction methods has been verified and evaluated.
arXiv Detail & Related papers (2023-03-15T20:55:25Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Drug and Disease Interpretation Learning with Biomedical Entity
Representation Transformer [9.152161078854146]
Concept normalization in free-form texts is a crucial step in every text-mining pipeline.
We propose a simple and effective two-stage neural approach based on fine-tuned BERT architectures.
arXiv Detail & Related papers (2021-01-22T20:01:25Z) - Unifying Relational Sentence Generation and Retrieval for Medical Image
Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case.
We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings [16.136832979324467]
We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset.
We identify dangerous latent relationships that are captured by the contextual word embeddings.
We evaluate performance gaps across different definitions of fairness on over 50 downstream clinical prediction tasks.
arXiv Detail & Related papers (2020-03-11T23:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.