Hierarchical Pretraining for Biomedical Term Embeddings
- URL: http://arxiv.org/abs/2307.00266v1
- Date: Sat, 1 Jul 2023 08:16:00 GMT
- Title: Hierarchical Pretraining for Biomedical Term Embeddings
- Authors: Bryan Cai, Sihang Zeng, Yucong Lin, Zheng Yuan, Doudou Zhou, and Lu
Tian
- Abstract summary: We propose HiPrBERT, a novel biomedical term representation model trained on hierarchical data.
We show that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications.
- Score: 4.69793648771741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic health records (EHR) contain narrative notes that provide
extensive details on the medical condition and management of patients. Natural
language processing (NLP) of clinical notes can use observed frequencies of
clinical terms as predictive features for downstream applications such as
clinical decision making and patient trajectory prediction. However, due to the
vast number of highly similar and related clinical concepts, a more effective
modeling strategy is to represent clinical terms as semantic embeddings via
representation learning and use the low dimensional embeddings as feature
vectors for predictive modeling. To achieve efficient representation,
fine-tuning pretrained language models with biomedical knowledge graphs may
generate better embeddings for biomedical terms than those from standard
language models alone. These embeddings can effectively discriminate synonymous
pairs of from those that are unrelated. However, they often fail to capture
different degrees of similarity or relatedness for concepts that are
hierarchical in nature. To overcome this limitation, we propose HiPrBERT, a
novel biomedical term representation model trained on additionally complied
data that contains hierarchical structures for various biomedical terms. We
modify an existing contrastive loss function to extract information from these
hierarchies. Our numerical experiments demonstrate that HiPrBERT effectively
learns the pair-wise distance from hierarchical information, resulting in a
substantially more informative embeddings for further biomedical applications
Related papers
- Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning [45.6771125432388]
We introduce GENomic REpresentation with Language Model (GENEREL)
GENEREL is a framework designed to bridge genetic and biomedical knowledge bases.
Our experiments demonstrate GENEREL's ability to effectively capture the nuanced relationships between SNPs and clinical concepts.
arXiv Detail & Related papers (2024-10-14T04:19:52Z) - Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study [2.0884301753594334]
This study performs a comparative analysis of various natural language models for medical text classification.
BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
arXiv Detail & Related papers (2024-08-30T10:28:49Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Knowledge Graph Embeddings for Multi-Lingual Structured Representations
of Radiology Reports [40.606143019674654]
We introduce a novel light-weight graph-based embedding method specifically catering radiology reports.
It takes into account the structure and composition of the report, while also connecting medical terms in the report.
We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification.
arXiv Detail & Related papers (2023-09-02T11:46:41Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Assessing mortality prediction through different representation models
based on concepts extracted from clinical notes [2.707154152696381]
Learning of embedding is a method for converting notes into a format that makes them comparable.
Transformer-based representation models have recently made a great leap forward.
We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction.
arXiv Detail & Related papers (2022-07-22T04:34:33Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative
Study [2.871614744079523]
It is not clear if pretrained models are useful for medical code prediction without further architecture engineering.
We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information.
Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes.
arXiv Detail & Related papers (2021-03-11T07:23:45Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.