From Large Language Models to Knowledge Graphs for Biomarker Discovery
in Cancer
- URL: http://arxiv.org/abs/2310.08365v2
- Date: Sun, 19 Nov 2023 17:27:48 GMT
- Title: From Large Language Models to Knowledge Graphs for Biomarker Discovery
in Cancer
- Authors: Md. Rezaul Karim and Lina Molinas Comet and Md Shajalal and Oya Deniz
Beyan and Dietrich Rebholz-Schuhmann and Stefan Decker
- Abstract summary: A challenging scenarios for artificial intelligence (AI) is using biomedical data to provide diagnosis and treatment recommendations for cancerous conditions.
A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations.
In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA.
- Score: 0.9437165725355702
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Domain experts often rely on most recent knowledge for apprehending and
disseminating specific biological processes that help them design strategies
for developing prevention and therapeutic decision-making in various disease
scenarios. A challenging scenarios for artificial intelligence (AI) is using
biomedical data (e.g., texts, imaging, omics, and clinical) to provide
diagnosis and treatment recommendations for cancerous conditions.~Data and
knowledge about biomedical entities like cancer, drugs, genes, proteins, and
their mechanism is spread across structured (knowledge bases (KBs)) and
unstructured (e.g., scientific articles) sources. A large-scale knowledge graph
(KG) can be constructed by integrating and extracting facts about semantically
interrelated entities and relations. Such a KG not only allows exploration and
question answering (QA) but also enables domain experts to deduce new
knowledge. However, exploring and querying large-scale KGs is tedious for
non-domain users due to their lack of understanding of the data assets and
semantic technologies. In this paper, we develop a domain KG to leverage
cancer-specific biomarker discovery and interactive QA. For this, we
constructed a domain ontology called OncoNet Ontology (ONO), which enables
semantic reasoning for validating gene-disease (different types of cancer)
relations. The KG is further enriched by harmonizing the ONO, metadata,
controlled vocabularies, and biomedical concepts from scientific articles by
employing BioBERT- and SciBERT-based information extractors. Further, since the
biomedical domain is evolving, where new findings often replace old ones,
without having access to up-to-date scientific findings, there is a high chance
an AI system exhibits concept drift while providing diagnosis and treatment.
Therefore, we fine-tune the KG using large language models (LLMs) based on more
recent articles and KBs.
Related papers
- Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning [45.6771125432388]
We introduce GENomic REpresentation with Language Model (GENEREL)
GENEREL is a framework designed to bridge genetic and biomedical knowledge bases.
Our experiments demonstrate GENEREL's ability to effectively capture the nuanced relationships between SNPs and clinical concepts.
arXiv Detail & Related papers (2024-10-14T04:19:52Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature [0.0]
This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases.
For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora.
For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach.
Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples.
arXiv Detail & Related papers (2023-09-11T18:05:12Z) - Neurosymbolic AI for Reasoning on Biomedical Knowledge Graphs [0.9085310904484414]
Biomedical datasets are often modeled as knowledge graphs (KGs) because they capture the multi-relational, heterogeneous, and dynamic natures of biomedical systems.
KG completion (KGC), can, therefore, help researchers make predictions to inform tasks like drug repositioning.
arXiv Detail & Related papers (2023-07-17T11:47:05Z) - BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph
Construction and Analysis [0.4893345190925178]
We propose an end-to-end approach for knowledge extraction and analysis from biomedical clinical notes.
The proposed framework can successfully extract relevant structured information with high accuracy.
arXiv Detail & Related papers (2023-04-21T14:45:33Z) - A Biomedical Knowledge Graph for Biomarker Discovery in Cancer [1.7860709946876898]
A domain-specific knowledge graph(KG) is an explicit conceptualization of a specific subject-matter domain.
The KG is constructed by integrating cancer-related knowledge and facts from multiple sources.
We listed down some queries and some examples of QA and deducing knowledge based on the KG.
arXiv Detail & Related papers (2023-02-09T16:17:57Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Discovering Drug-Target Interaction Knowledge from Biomedical Literature [107.98712673387031]
The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications.
As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from literature becomes an urgent demand in the industry.
We explore the first end-to-end solution for this task by using generative approaches.
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
arXiv Detail & Related papers (2021-09-27T17:00:14Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.