BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs
- URL: http://arxiv.org/abs/2310.03320v4
- Date: Fri, 19 Jan 2024 02:47:51 GMT
- Title: BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs
- Authors: Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N.
Ioannidis, Huzefa Rangwala, Rishita Anubhai
- Abstract summary: We present BioBridge, a novel parameter-efficient learning framework to bridge independently trained unimodal FMs to establish multimodal behavior.
Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods.
We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations.
- Score: 27.32543389443672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models (FMs) are able to leverage large volumes of unlabeled data
to demonstrate superior performance across a wide range of tasks. However, FMs
developed for biomedical domains have largely remained unimodal, i.e.,
independently trained and used for tasks on protein sequences alone, small
molecule structures alone, or clinical data alone. To overcome this limitation
of biomedical FMs, we present BioBridge, a novel parameter-efficient learning
framework, to bridge independently trained unimodal FMs to establish multimodal
behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn
transformations between one unimodal FM and another without fine-tuning any
underlying unimodal FMs. Our empirical results demonstrate that BioBridge can
beat the best baseline KG embedding methods (on average by around 76.3%) in
cross-modal retrieval tasks. We also identify BioBridge demonstrates
out-of-domain generalization ability by extrapolating to unseen modalities or
relations. Additionally, we also show that BioBridge presents itself as a
general purpose retriever that can aid biomedical multimodal question answering
as well as enhance the guided generation of novel drugs.
Related papers
- BSM: Small but Powerful Biological Sequence Model for Genes and Proteins [6.6055625629542085]
We introduce BSM, a small but powerful mixed-modal biological sequence foundation model.
It is trained on three types of data: RefSeq, Gene Related Sequences, and interleaved biological sequences from the web.
It significantly enhances learning efficiency and cross-modal representation, outperforming models trained solely on unimodal data.
arXiv Detail & Related papers (2024-10-15T11:12:28Z) - CryoFM: A Flow-based Foundation Model for Cryo-EM Densities [50.291974465864364]
We present CryoFM, a foundation model designed as a generative model, learning the distribution of high-quality density maps.
Built on flow matching, CryoFM is trained to accurately capture the prior distribution of biomolecular density maps.
arXiv Detail & Related papers (2024-10-11T08:53:58Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Towards Generalist Biomedical AI [28.68106423175678]
We introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system.
Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data.
We conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales.
arXiv Detail & Related papers (2023-07-26T17:52:22Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - BiomedCLIP: a multimodal biomedical foundation model pretrained from
fifteen million scientific image-text pairs [48.376109878173956]
We present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets.
PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles.
Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing.
arXiv Detail & Related papers (2023-03-02T02:20:04Z) - BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves
Biomedical Machine Reading Comprehension Task [4.837365865245979]
We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task.
BioADAPT-MRC is a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets.
arXiv Detail & Related papers (2022-02-26T16:14:27Z) - BioALBERT: A Simple and Effective Pre-trained Language Model for
Biomedical Named Entity Recognition [9.05154470433578]
Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models.
We propose biomedical ALBERT, an effective domain-specific language model trained on large-scale biomedical corpora.
arXiv Detail & Related papers (2020-09-19T12:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.