Improving Biomedical Entity Linking with Retrieval-enhanced Learning
- URL: http://arxiv.org/abs/2312.09806v1
- Date: Fri, 15 Dec 2023 14:04:23 GMT
- Title: Improving Biomedical Entity Linking with Retrieval-enhanced Learning
- Authors: Zhenxi Lin, Ziheng Zhang, Xian Wu, Yefeng Zheng
- Abstract summary: $k$NN-BioEL provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction.
We show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
- Score: 53.24726622142558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical entity linking (BioEL) has achieved remarkable progress with the
help of pre-trained language models. However, existing BioEL methods usually
struggle to handle rare and difficult entities due to long-tailed distribution.
To address this limitation, we introduce a new scheme $k$NN-BioEL, which
provides a BioEL model with the ability to reference similar instances from the
entire training corpus as clues for prediction, thus improving the
generalization capabilities. Moreover, we design a contrastive learning
objective with dynamic hard negative sampling (DHNS) that improves the quality
of the retrieved neighbors during inference. Extensive experimental results
show that $k$NN-BioEL outperforms state-of-the-art baselines on several
datasets.
Related papers
- Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images [7.5123289730388825]
Genome-informed Hyper-Attention Network (G-HANet) is capable of effectively distilling histo-genomic knowledge during training.
Network comprises cross-modal associating branch (CAB) and hyper-attention survival branch (HSB)
arXiv Detail & Related papers (2024-03-15T06:20:09Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Embracing assay heterogeneity with neural processes for markedly
improved bioactivity predictions [0.276240219662896]
Predicting the bioactivity of a ligand is one of the hardest and most important challenges in computer-aided drug discovery.
Despite years of data collection and curation efforts, bioactivity data remains sparse and heterogeneous.
We present a hierarchical meta-learning framework that exploits the information synergy across disparate assays.
arXiv Detail & Related papers (2023-08-17T16:26:58Z) - Incorporating Prior Knowledge in Deep Learning Models via Pathway
Activity Autoencoders [5.950889585409067]
We propose a novel prior-knowledge-based deep auto-encoding framework, PAAE, for RNA-seq data in cancer.
We show that, despite having access to a smaller set of features, our PAAE and PAVAE models achieve better out-of-set reconstruction results compared to common methodologies.
arXiv Detail & Related papers (2023-06-09T11:12:55Z) - No Pairs Left Behind: Improving Metric Learning with Regularized Triplet
Objective [19.32706951298244]
We propose a novel formulation of the triplet objective function that improves metric learning without additional sample mining or overhead costs.
We show that our method (called No Pairs Left Behind [NPLB]) improves upon the traditional and current state-of-the-art triplet objective formulations.
arXiv Detail & Related papers (2022-10-18T00:56:01Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Evolution Is All You Need: Phylogenetic Augmentation for Contrastive
Learning [1.7188280334580197]
Self-supervised representation learning of biological sequence embeddings alleviates computational resource constraints on downstream tasks.
We show that contrastive learning using evolutionary phylogenetic augmentation can be used as a representation learning objective.
arXiv Detail & Related papers (2020-12-25T01:35:06Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.