PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using
Human Phenotype Ontology
- URL: http://arxiv.org/abs/2009.08478v2
- Date: Mon, 25 Jan 2021 16:08:43 GMT
- Title: PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using
Human Phenotype Ontology
- Authors: Ling Luo, Shankai Yan, Po-Ting Lai, Daniel Veltri, Andrew Oler,
Sandhya Xirasagar, Rajarshi Ghosh, Morgan Similuk, Peter N. Robinson, Zhiyong
Lu
- Abstract summary: PhenoTagger is a hybrid method that combines both dictionary and machine learning-based methods to recognize concepts in unstructured text.
Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods.
- Score: 6.165755812152143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic phenotype concept recognition from unstructured text remains a
challenging task in biomedical text mining research. Previous works that
address the task typically use dictionary-based matching methods, which can
achieve high precision but suffer from lower recall. Recently, machine
learning-based methods have been proposed to identify biomedical concepts,
which can recognize more unseen concept synonyms by automatic feature learning.
However, most methods require large corpora of manually annotated data for
model training, which is difficult to obtain due to the high cost of human
annotation. In this paper, we propose PhenoTagger, a hybrid method that
combines both dictionary and machine learning-based methods to recognize Human
Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use
all concepts and synonyms in HPO to construct a dictionary, which is then used
to automatically build a distantly supervised training dataset for machine
learning. Next, a cutting-edge deep learning model is trained to classify each
candidate phrase (n-gram from input sentence) into a corresponding concept
label. Finally, the dictionary and machine learning-based prediction results
are combined for improved performance. Our method is validated with two HPO
corpora, and the results show that PhenoTagger compares favorably to previous
methods. In addition, to demonstrate the generalizability of our method, we
retrained PhenoTagger using the disease ontology MEDIC for disease concept
recognition to investigate the effect of training on different ontologies.
Experimental results on the NCBI disease corpus show that PhenoTagger without
requiring manually annotated training data achieves competitive performance as
compared with state-of-the-art supervised methods.
Related papers
- Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank [0.0]
We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem.
We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-22T17:30:16Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Few-Shot Meta Learning for Recognizing Facial Phenotypes of Genetic
Disorders [55.41644538483948]
Automated classification and similarity retrieval aid physicians in decision-making to diagnose possible genetic conditions as early as possible.
Previous work has addressed the problem as a classification problem and used deep learning methods.
In this study, we used a facial recognition model trained on a large corpus of healthy individuals as a pre-task and transferred it to facial phenotype recognition.
arXiv Detail & Related papers (2022-10-23T11:52:57Z) - Generative Biomedical Entity Linking via Knowledge Base-Guided
Pre-training and Synonyms-Aware Fine-tuning [0.8154691566915505]
We propose a generative approach to model biomedical entity linking (EL)
We propose KB-guided pre-training by constructing synthetic samples with synonyms and definitions from KB.
We also propose synonyms-aware fine-tuning to select concept names for training, and propose decoder prompt and multi-synonyms constrained prefix tree for inference.
arXiv Detail & Related papers (2022-04-11T14:50:51Z) - BERT WEAVER: Using WEight AVERaging to enable lifelong learning for
transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model.
We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Self-Supervised Detection of Contextual Synonyms in a Multi-Class
Setting: Phenotype Annotation Use Case [11.912581294872767]
Contextualised word embeddings is a powerful tool to detect contextual synonyms.
We propose a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching.
arXiv Detail & Related papers (2021-09-04T21:35:01Z) - End-to-end Biomedical Entity Linking with Span-based Dictionary Matching [5.273138059454523]
Disease name recognition and normalization is a fundamental process in biomedical text mining.
This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features.
Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models.
arXiv Detail & Related papers (2021-04-21T12:24:12Z) - Hierarchical Learning Using Deep Optimum-Path Forest [55.60116686945561]
Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses.
In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW.
arXiv Detail & Related papers (2021-02-18T13:02:40Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.