Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and
Corpus-based Semantic Representation for Biomedical Concepts
- URL: http://arxiv.org/abs/2004.06555v1
- Date: Tue, 14 Apr 2020 14:38:41 GMT
- Title: Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and
Corpus-based Semantic Representation for Biomedical Concepts
- Authors: Steven Jiang, Weiyi Wu, Naofumi Tomita, Craig Ganoe, Saeed Hassanpour
- Abstract summary: This paper introduces Multi-Ontology Embeddings (MORE), a framework for incorporating domain knowledge from multiple ontologies into a distributional semantic model.
We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE.
For the corpus-based part, we use the Medical Subject Headings (MeSH) and three state-of-the-art-based similarity measures.
- Score: 0.5812284760539712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objective: Currently, a major limitation for natural language processing
(NLP) analyses in clinical applications is that a concept can be referenced in
various forms across different texts. This paper introduces Multi-Ontology
Refined Embeddings (MORE), a novel hybrid framework for incorporating domain
knowledge from multiple ontologies into a distributional semantic model,
learned from a corpus of clinical text.
Materials and Methods: We use the RadCore and MIMIC-III free-text datasets
for the corpus-based component of MORE. For the ontology-based part, we use the
Medical Subject Headings (MeSH) ontology and three state-of-the-art
ontology-based similarity measures. In our approach, we propose a new learning
objective, modified from the Sigmoid cross-entropy objective function.
Results and Discussion: We evaluate the quality of the generated word
embeddings using two established datasets of semantic similarities among
biomedical concept pairs. On the first dataset with 29 concept pairs, with the
similarity scores established by physicians and medical coders, MORE's
similarity scores have the highest combined correlation (0.633), which is 5.0%
higher than that of the baseline model and 12.4% higher than that of the best
ontology-based similarity measure.On the second dataset with 449 concept pairs,
MORE's similarity scores have a correlation of 0.481, with the average of four
medical residents' similarity ratings, and that outperforms the skip-gram model
by 8.1% and the best ontology measure by 6.9%.
Related papers
- Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning [45.6771125432388]
We introduce GENomic REpresentation with Language Model (GENEREL)
GENEREL is a framework designed to bridge genetic and biomedical knowledge bases.
Our experiments demonstrate GENEREL's ability to effectively capture the nuanced relationships between SNPs and clinical concepts.
arXiv Detail & Related papers (2024-10-14T04:19:52Z) - ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data [24.36545694430613]
This study aimed to construct an Integrated Ontology of symptom phenotypes (ISPO) to support the data mining of Chinese EMRs and real-world study in TCM field.
arXiv Detail & Related papers (2024-07-08T15:23:50Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques [0.0]
Multiple terms can refer to the same core concepts which can be referred as a clinical entity.
Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities.
We propose a suite of context-based and context-less remention techniques for performing the entity disambiguation.
arXiv Detail & Related papers (2024-05-24T01:14:33Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Exploring the In-context Learning Ability of Large Language Model for
Biomedical Concept Linking [4.8882241537236455]
This research investigates a method that exploits the in-context learning capabilities of large models for biomedical concept linking.
The proposed approach adopts a two-stage retrieve-and-rank framework.
It achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization.
arXiv Detail & Related papers (2023-07-03T16:19:50Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - MIMICause : Defining, identifying and predicting types of causal
relationships between biomedical concepts from clinical notes [0.0]
We propose annotation guidelines, develop an annotated corpus and provide baseline scores to identify types and direction of causal relations between a pair of biomedical concepts in clinical notes.
We annotate a total of 2714 de-identified examples sampled from the 2018 n2c2 shared task dataset and train four different language model based architectures.
The high inter-annotator agreement for clinical text shows the quality of our annotation guidelines while the provided baseline F1 score sets the direction for future research towards understanding narratives in clinical texts.
arXiv Detail & Related papers (2021-10-14T00:15:36Z) - Neural sentence embedding models for semantic similarity estimation in
the biomedical domain [6.325814141416726]
We trained different neural embedding models on 1.7 million articles from the PubMed Open Access dataset.
We evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts.
arXiv Detail & Related papers (2021-10-01T13:27:44Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.