Improving Broad-Coverage Medical Entity Linking with Semantic Type
Prediction and Large-Scale Datasets
- URL: http://arxiv.org/abs/2005.00460v4
- Date: Sun, 22 Aug 2021 06:53:08 GMT
- Title: Improving Broad-Coverage Medical Entity Linking with Semantic Type
Prediction and Large-Scale Datasets
- Authors: Shikhar Vashishth, Denis Newman-Griffis, Rishabh Joshi, Ritam Dutt,
Carolyn Rose
- Abstract summary: MedType is a fully modular system that prunes out irrelevant candidate concepts based on the predicted semantic type of an entity mention.
We present WikiMed and PubMedDS, two large-scale medical entity linking datasets, and demonstrate that pre-training MedType on these datasets further improves entity linking performance.
- Score: 12.131050765159145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical entity linking is the task of identifying and standardizing medical
concepts referred to in an unstructured text. Most of the existing methods
adopt a three-step approach of (1) detecting mentions, (2) generating a list of
candidate concepts, and finally (3) picking the best concept among them. In
this paper, we probe into alleviating the problem of overgeneration of
candidate concepts in the candidate generation module, the most under-studied
component of medical entity linking. For this, we present MedType, a fully
modular system that prunes out irrelevant candidate concepts based on the
predicted semantic type of an entity mention. We incorporate MedType into five
off-the-shelf toolkits for medical entity linking and demonstrate that it
consistently improves entity linking performance across several benchmark
datasets. To address the dearth of annotated training data for medical entity
linking, we present WikiMed and PubMedDS, two large-scale medical entity
linking datasets, and demonstrate that pre-training MedType on these datasets
further improves entity linking performance. We make our source code and
datasets publicly available for medical entity linking research.
Related papers
- Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques [0.0]
Multiple terms can refer to the same core concepts which can be referred as a clinical entity.
Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities.
We propose a suite of context-based and context-less remention techniques for performing the entity disambiguation.
arXiv Detail & Related papers (2024-05-24T01:14:33Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Biomedical Entity Linking as Multiple Choice Question Answering [48.74212158495695]
We present BioELQA, a novel model that treats Biomedical Entity Linking as Multiple Choice Question Answering.
We first obtains candidate entities with a fast retriever, jointly presents the mention and candidate entities to a generator, and then outputs the predicted symbol associated with its chosen entity.
To improve generalization for long-tailed entities, we retrieve similar labeled training instances as clues and the input with retrieved instances for the generator.
arXiv Detail & Related papers (2024-02-23T08:40:38Z) - Zero-Shot Medical Information Retrieval via Knowledge Graph Embedding [27.14794371879541]
This paper introduces MedFusionRank, a novel approach to zero-shot medical information retrieval (MIR)
The proposed approach leverages a pre-trained BERT-style model to extract compact yet informative keywords.
These keywords are then enriched with domain knowledge by linking them to conceptual entities within a medical knowledge graph.
arXiv Detail & Related papers (2023-10-31T16:26:33Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Fast and Effective Biomedical Entity Linking Using a Dual Encoder [48.86736921025866]
We propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot.
We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking.
arXiv Detail & Related papers (2021-03-08T19:32:28Z) - Medical Entity Linking using Triplet Network [7.18342554344254]
We present an approach to rank the candidate Knowledge Base entries based on their similarity with disease mention.
We introduce a robust and portable candidate generation scheme that does not make use of the hand-crafted rules.
Experimental results on the standard benchmark NCBI dataset demonstrate that our system outperforms the prior methods by a significant margin.
arXiv Detail & Related papers (2020-12-21T07:44:37Z) - Clustering-based Inference for Biomedical Entity Linking [40.78384867437563]
We introduce a model in which linking decisions can be made not merely by linking to a knowledge base entity but also by grouping multiple mentions together via clustering and jointly making linking predictions.
In experiments on the largest publicly available biomedical dataset, we improve the best independent prediction for entity linking by 3.0 points of accuracy.
arXiv Detail & Related papers (2020-10-21T19:16:27Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - COMETA: A Corpus for Medical Entity Linking in the Social Media [27.13349965075764]
We introduce a new corpus called COMETA, consisting of 20k English biomedical entity mentions from Reddit expert-annotated with links to SNOMED CT.
Our corpus satisfies a combination of desirable properties, from scale and coverage to diversity and quality.
We shed light on the ability of these systems to perform complex inference on entities and concepts under 2 challenging evaluation scenarios.
arXiv Detail & Related papers (2020-10-07T09:16:45Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.