Integration of Domain Knowledge using Medical Knowledge Graph Deep
Learning for Cancer Phenotyping
- URL: http://arxiv.org/abs/2101.01337v1
- Date: Tue, 5 Jan 2021 03:59:43 GMT
- Title: Integration of Domain Knowledge using Medical Knowledge Graph Deep
Learning for Cancer Phenotyping
- Authors: Mohammed Alawad, Shang Gao, Mayanka Chandra Shekar, S.M.Shamimul
Hasan, J. Blair Christian, Xiao-Cheng Wu, Eric B. Durbin, Jennifer Doherty,
Antoinette Stroup, Linda Coyle, Lynne Penberthy, Georgia Tourassi
- Abstract summary: We propose a method to integrate external knowledge from medical terminology into the context captured by word embeddings.
We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics from a dataset of 900K cancer pathology reports.
- Score: 6.077023952306772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key component of deep learning (DL) for natural language processing (NLP)
is word embeddings. Word embeddings that effectively capture the meaning and
context of the word that they represent can significantly improve the
performance of downstream DL models for various NLP tasks. Many existing word
embeddings techniques capture the context of words based on word co-occurrence
in documents and text; however, they often cannot capture broader
domain-specific relationships between concepts that may be crucial for the NLP
task at hand. In this paper, we propose a method to integrate external
knowledge from medical terminology ontologies into the context captured by word
embeddings. Specifically, we use a medical knowledge graph, such as the unified
medical language system (UMLS), to find connections between clinical terms in
cancer pathology reports. This approach aims to minimize the distance between
connected clinical concepts. We evaluate the proposed approach using a
Multitask Convolutional Neural Network (MT-CNN) to extract six cancer
characteristics -- site, subsite, laterality, behavior, histology, and grade --
from a dataset of ~900K cancer pathology reports. The results show that the
MT-CNN model which uses our domain informed embeddings outperforms the same
MT-CNN using standard word2vec embeddings across all tasks, with an improvement
in the overall micro- and macro-F1 scores by 4.97\%and 22.5\%, respectively.
Related papers
- UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for
Biomedical Entity Recognition [4.865221751784403]
This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS.
Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.
arXiv Detail & Related papers (2023-07-20T18:08:34Z) - Combining Contrastive Learning and Knowledge Graph Embeddings to develop
medical word embeddings for the Italian language [0.0]
This paper attempts to improve available embeddings in the uncovered niche of the Italian medical domain.
The main objective is to improve the accuracy of semantic similarity between medical terms.
Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution.
arXiv Detail & Related papers (2022-11-09T17:12:28Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Knowledge-Base Enriched Word Embeddings for Biomedical Domain [5.086571902225929]
We propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge.
Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way.
arXiv Detail & Related papers (2021-02-20T18:18:51Z) - A Multi-Task Deep Learning Framework to Localize the Eloquent Cortex in
Brain Tumor Patients Using Dynamic Functional Connectivity [7.04584289867204]
We present a novel deep learning framework that uses dynamic functional connectivity to simultaneously localize the language and motor areas of the eloquent cortex in brain tumor patients.
Our model achieves higher localization accuracies than conventional deep learning approaches and can identify bilateral language areas even when trained on left-hemisphere lateralized cases.
arXiv Detail & Related papers (2020-11-17T18:18:09Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.