Evaluating Biomedical BERT Models for Vocabulary Alignment at Scale in
the UMLS Metathesaurus
- URL: http://arxiv.org/abs/2109.13348v1
- Date: Tue, 14 Sep 2021 16:52:16 GMT
- Title: Evaluating Biomedical BERT Models for Vocabulary Alignment at Scale in
the UMLS Metathesaurus
- Authors: Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip,
Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider
- Abstract summary: The current UMLS (Unified Medical Language System) Metathesaurus construction process is expensive and error-prone.
Recent advances in Natural Language Processing have achieved state-of-the-art (SOTA) performance on downstream tasks.
We aim to validate if approaches using the BERT models can actually outperform the existing approaches for predicting synonymy in the UMLS Metathesaurus.
- Score: 8.961270657070942
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The current UMLS (Unified Medical Language System) Metathesaurus construction
process for integrating over 200 biomedical source vocabularies is expensive
and error-prone as it relies on the lexical algorithms and human editors for
deciding if the two biomedical terms are synonymous. Recent advances in Natural
Language Processing such as Transformer models like BERT and its biomedical
variants with contextualized word embeddings have achieved state-of-the-art
(SOTA) performance on downstream tasks. We aim to validate if these approaches
using the BERT models can actually outperform the existing approaches for
predicting synonymy in the UMLS Metathesaurus. In the existing Siamese Networks
with LSTM and BioWordVec embeddings, we replace the BioWordVec embeddings with
the biomedical BERT embeddings extracted from each BERT model using different
ways of extraction. In the Transformer architecture, we evaluate the use of the
different biomedical BERT models that have been pre-trained using different
datasets and tasks. Given the SOTA performance of these BERT models for other
downstream tasks, our experiments yield surprisingly interesting results: (1)
in both model architectures, the approaches employing these biomedical
BERT-based models do not outperform the existing approaches using Siamese
Network with BioWordVec embeddings for the UMLS synonymy prediction task, (2)
the original BioBERT large model that has not been pre-trained with the UMLS
outperforms the SapBERT models that have been pre-trained with the UMLS, and
(3) using the Siamese Networks yields better performance for synonymy
prediction when compared to using the biomedical BERT models.
Related papers
- Multi-level biomedical NER through multi-granularity embeddings and
enhanced labeling [3.8599767910528917]
This paper proposes a hybrid approach that integrates the strengths of multiple models.
BERT provides contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text.
We evaluate our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11.
arXiv Detail & Related papers (2023-12-24T21:45:36Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Improving Biomedical Entity Linking with Retrieval-enhanced Learning [53.24726622142558]
$k$NN-BioEL provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction.
We show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
arXiv Detail & Related papers (2023-12-15T14:04:23Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - BioGPT: Generative Pre-trained Transformer for Biomedical Text
Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature.
We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - A Hybrid Approach to Measure Semantic Relatedness in Biomedical Concepts [0.0]
We generated concept vectors by encoding concept preferred terms using ELMo, BERT, and Sentence BERT models.
We trained all the BERT models using Siamese network on SNLI and STSb datasets to allow the models to learn more semantic information.
Injecting ontology knowledge into concept vectors further enhances their quality and contributes to better relatedness scores.
arXiv Detail & Related papers (2021-01-25T16:01:27Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - BioALBERT: A Simple and Effective Pre-trained Language Model for
Biomedical Named Entity Recognition [9.05154470433578]
Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models.
We propose biomedical ALBERT, an effective domain-specific language model trained on large-scale biomedical corpora.
arXiv Detail & Related papers (2020-09-19T12:58:47Z) - Pre-training technique to localize medical BERT and enhance biomedical
BERT [0.0]
It is difficult to train specific BERT models that perform well for domains in which there are few publicly available databases of high quality and large size.
We propose a single intervention with one option: simultaneous pre-training after up-sampling and amplified vocabulary.
Our Japanese medical BERT outperformed conventional baselines and the other BERT models in terms of the medical document classification task.
arXiv Detail & Related papers (2020-05-14T18:00:01Z) - An Empirical Study of Multi-Task Learning on BERT for Biomedical Text
Mining [17.10823632511911]
We study a multi-task learning model with multiple decoders on varieties of biomedical and clinical natural language processing tasks.
Our empirical results demonstrate that the MTL fine-tuned models outperform state-of-the-art transformer models.
arXiv Detail & Related papers (2020-05-06T13:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.