Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank
- URL: http://arxiv.org/abs/2310.14366v1
- Date: Sun, 22 Oct 2023 17:30:16 GMT
- Title: Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank
- Authors: Zainab Awan, Tim Kahlke, Peter Ralph and Paul Kennedy
- Abstract summary: We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem.
We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Motivation: Biomedical named-entity normalization involves connecting
biomedical entities with distinct database identifiers in order to facilitate
data integration across various fields of biology. Existing systems for
biomedical named entity normalization heavily rely on dictionaries, manually
created rules, and high-quality representative features such as lexical or
morphological characteristics. However, recent research has investigated the
use of neural network-based models to reduce dependence on dictionaries,
manually crafted rules, and features. Despite these advancements, the
performance of these models is still limited due to the lack of sufficiently
large training datasets. These models have a tendency to overfit small training
corpora and exhibit poor generalization when faced with previously unseen
entities, necessitating the redesign of rules and features. Contribution: We
present a novel deep learning approach for named entity normalization, treating
it as a pair-wise learning to rank problem. Our method utilizes the widely-used
information retrieval algorithm Best Matching 25 to generate candidate
concepts, followed by the application of bi-directional encoder representation
from the encoder (BERT) to re-rank the candidate list. Notably, our approach
eliminates the need for feature-engineering or rule creation. We conduct
experiments on species entity types and evaluate our method against
state-of-the-art techniques using LINNAEUS and S800 biomedical corpora. Our
proposed approach surpasses existing methods in linking entities to the NCBI
taxonomy. To the best of our knowledge, there is no existing neural
network-based approach for species normalization in the literature.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Multi-level biomedical NER through multi-granularity embeddings and
enhanced labeling [3.8599767910528917]
This paper proposes a hybrid approach that integrates the strengths of multiple models.
BERT provides contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text.
We evaluate our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11.
arXiv Detail & Related papers (2023-12-24T21:45:36Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - End-to-end Biomedical Entity Linking with Span-based Dictionary Matching [5.273138059454523]
Disease name recognition and normalization is a fundamental process in biomedical text mining.
This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features.
Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models.
arXiv Detail & Related papers (2021-04-21T12:24:12Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - A Lightweight Neural Model for Biomedical Entity Linking [1.8047694351309205]
We propose a lightweight neural method for biomedical entity linking.
Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names.
Our model is competitive with previous work on standard evaluation benchmarks.
arXiv Detail & Related papers (2020-12-16T10:34:37Z) - Domain Generalization for Medical Imaging Classification with
Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification.
Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z) - Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition"
The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors.
Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.