BELHD: Improving Biomedical Entity Linking with Homonoym Disambiguation
- URL: http://arxiv.org/abs/2401.05125v1
- Date: Wed, 10 Jan 2024 12:45:18 GMT
- Title: BELHD: Improving Biomedical Entity Linking with Homonoym Disambiguation
- Authors: Samuele Garda and Ulf Leser
- Abstract summary: Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB)
BELHD builds upon the BioSyn (Sung et al., 2020) model introducing two crucial extensions.
Experiments with 10 corpora and five entity types show that BELHD improves upon state-of-the-art approaches.
- Score: 4.477762005644463
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Biomedical entity linking (BEL) is the task of grounding entity mentions to a
knowledge base (KB). A popular approach to the task are name-based methods,
i.e. those identifying the most appropriate name in the KB for a given mention,
either via dense retrieval or autoregressive modeling. However, as these
methods directly return KB names, they cannot cope with homonyms, i.e.
different KB entities sharing the exact same name. This significantly affects
their performance, especially for KBs where homonyms account for a large amount
of entity mentions (e.g. UMLS and NCBI Gene). We therefore present BELHD
(Biomedical Entity Linking with Homonym Disambiguation), a new name-based
method that copes with this challenge. Specifically, BELHD builds upon the
BioSyn (Sung et al.,2020) model introducing two crucial extensions. First, it
performs a preprocessing of the KB in which it expands homonyms with an
automatically chosen disambiguating string, thus enforcing unique linking
decisions. Second, we introduce candidate sharing, a novel strategy to select
candidates for contrastive learning that enhances the overall training signal.
Experiments with 10 corpora and five entity types show that BELHD improves upon
state-of-the-art approaches, achieving the best results in 6 out 10 corpora
with an average improvement of 4.55pp recall@1. Furthermore, the KB
preprocessing is orthogonal to the core prediction model and thus can also
improve other methods, which we exemplify for GenBioEL (Yuan et al, 2022), a
generative name-based BEL approach. Code is available at: link added upon
publication.
Related papers
- Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models [70.82728812001807]
A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool.
We theorize that enhancing performance requires expanding the semantic pool.
We show that expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95.
arXiv Detail & Related papers (2024-10-11T08:24:11Z) - Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank [0.0]
We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem.
We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-22T17:30:16Z) - Improving Self-training for Cross-lingual Named Entity Recognition with
Contrastive and Prototype Learning [80.08139343603956]
In cross-lingual named entity recognition, self-training is commonly used to bridge the linguistic gap.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
arXiv Detail & Related papers (2023-05-23T02:52:16Z) - Exploring Partial Knowledge Base Inference in Biomedical Entity Linking [0.4798394926736971]
We name this scenario partial knowledge base inference.
We construct benchmarks and witness a catastrophic degradation in EL performance due to dramatically precision drop.
We propose two simple-and-effective redemption methods to combat the NIL issue with little computational overhead.
arXiv Detail & Related papers (2023-03-18T04:31:07Z) - Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity
Linking [23.01938139604297]
We propose a new BERT-based Entity Linking (EL) method which can identify mentions that do not have corresponding KB entities by matching them to a NIL entity.
Results on five datasets show the advantages of BLINKout over existing methods to identify out-of-KB mentions.
arXiv Detail & Related papers (2023-02-14T17:00:06Z) - Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge
Base and Database [86.03294330305097]
We propose a unified semantic element for question answering (QA) on both knowledge bases (KB) and databases (DB)
We introduce the primitive (relation and entity in KB, table name, column name and cell value in DB) as an essential element in our framework.
We leverage the generator to predict final logical forms by altering and composing topranked primitives with different operations.
arXiv Detail & Related papers (2022-11-09T19:33:27Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - Generative Biomedical Entity Linking via Knowledge Base-Guided
Pre-training and Synonyms-Aware Fine-tuning [0.8154691566915505]
We propose a generative approach to model biomedical entity linking (EL)
We propose KB-guided pre-training by constructing synthetic samples with synonyms and definitions from KB.
We also propose synonyms-aware fine-tuning to select concept names for training, and propose decoder prompt and multi-synonyms constrained prefix tree for inference.
arXiv Detail & Related papers (2022-04-11T14:50:51Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Fast and Effective Biomedical Entity Linking Using a Dual Encoder [48.86736921025866]
We propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot.
We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking.
arXiv Detail & Related papers (2021-03-08T19:32:28Z) - A Lightweight Neural Model for Biomedical Entity Linking [1.8047694351309205]
We propose a lightweight neural method for biomedical entity linking.
Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names.
Our model is competitive with previous work on standard evaluation benchmarks.
arXiv Detail & Related papers (2020-12-16T10:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.