Cross-Domain Data Integration for Named Entity Disambiguation in
Biomedical Text
- URL: http://arxiv.org/abs/2110.08228v1
- Date: Fri, 15 Oct 2021 17:38:16 GMT
- Title: Cross-Domain Data Integration for Named Entity Disambiguation in
Biomedical Text
- Authors: Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling,
Christopher R\'e
- Abstract summary: We propose a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain.
We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining.
Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR.
- Score: 5.008513565240167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named entity disambiguation (NED), which involves mapping textual mentions to
structured entities, is particularly challenging in the medical domain due to
the presence of rare entities. Existing approaches are limited by the presence
of coarse-grained structural resources in biomedical knowledge bases as well as
the use of training datasets that provide low coverage over uncommon resources.
In this work, we address these issues by proposing a cross-domain data
integration method that transfers structural knowledge from a general text
knowledge base to the medical domain. We utilize our integration scheme to
augment structural resources and generate a large biomedical NED dataset for
pretraining. Our pretrained model with injected structural knowledge achieves
state-of-the-art performance on two benchmark medical NED datasets: MedMentions
and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57
accuracy points.
Related papers
- LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks.
We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD)
LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z) - BioMNER: A Dataset for Biomedical Method Entity Recognition [25.403593761614424]
We propose a novel dataset for biomedical method entity recognition.
We employ an automated BioMethod entity recognition and information retrieval system to assist human annotation.
Our empirical findings reveal that the large parameter counts of language models surprisingly inhibit the effective assimilation of entity extraction patterns.
arXiv Detail & Related papers (2024-06-28T16:34:24Z) - Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques [0.0]
Multiple terms can refer to the same core concepts which can be referred as a clinical entity.
Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities.
We propose a suite of context-based and context-less remention techniques for performing the entity disambiguation.
arXiv Detail & Related papers (2024-05-24T01:14:33Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal
Segmentation [100.86339246424541]
We propose a novel Structure-Modal Constrained (SMC) UDA framework based on a discriminative paradigm and introduce edge structure as a bridge between domains.
With the structure-constrained self-learning and progressive ROI, our methods segment the kidney by locating the 3D spatial structure of the edge.
experiments show that our proposed SMC-UDA has a strong generalization and outperforms generative UDA methods.
arXiv Detail & Related papers (2023-06-14T02:57:23Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Biomedical Entity Linking with Contrastive Context Matching [5.2710726359379265]
We introduce BioCoM, a contrastive learning framework for biomedical entity linking.
We build the training instances from raw PubMed articles by dictionary matching.
We predict the normalized biomedical entity at inference time through a nearest-neighbor search.
arXiv Detail & Related papers (2021-06-14T16:43:33Z) - DARCNN: Domain Adaptive Region-based Convolutional Neural Network for
Unsupervised Instance Segmentation in Biomedical Images [4.3171602814387136]
We propose leveraging the wealth of annotations in benchmark computer vision datasets to conduct unsupervised instance segmentation for diverse biomedical datasets.
We propose a Domain Adaptive Region-based Convolutional Neural Network (DARCNN), that adapts knowledge of object definition from COCO to multiple biomedical datasets.
We showcase DARCNN's performance for unsupervised instance segmentation on numerous biomedical datasets.
arXiv Detail & Related papers (2021-04-03T06:54:33Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Towards Cross-modality Medical Image Segmentation with Online Mutual
Knowledge Distillation [71.89867233426597]
In this paper, we aim to exploit the prior knowledge learned from one modality to improve the segmentation performance on another modality.
We propose a novel Mutual Knowledge Distillation scheme to thoroughly exploit the modality-shared knowledge.
Experimental results on the public multi-class cardiac segmentation data, i.e., MMWHS 2017, show that our method achieves large improvements on CT segmentation.
arXiv Detail & Related papers (2020-10-04T10:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.