An Open-Source Knowledge Graph Ecosystem for the Life Sciences
- URL: http://arxiv.org/abs/2307.05727v2
- Date: Tue, 30 Jan 2024 07:27:32 GMT
- Title: An Open-Source Knowledge Graph Ecosystem for the Life Sciences
- Authors: Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca
Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A.
Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt,
Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter
N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana,
Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo,
Nicole A. Vasilevsky, Robert Hoehndorf, Tellen D. Bennett, Patrick B. Ryan,
George Hripcsak, Michael G. Kahn, Michael Bada, William A. Baumgartner Jr,
Lawrence E. Hunter
- Abstract summary: PheKnowLator is a semantic ecosystem for automating the construction of ontologically grounded knowledge graphs.
The ecosystem includes KG construction resources, analysis tools, and benchmarks.
PheKnowLator enables fully customizable KGs without compromising performance or usability.
- Score: 5.665519167428707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translational research requires data at multiple scales of biological
organization. Advancements in sequencing and multi-omics technologies have
increased the availability of these data, but researchers face significant
integration challenges. Knowledge graphs (KGs) are used to model complex
phenomena, and methods exist to construct them automatically. However, tackling
complex biomedical integration problems requires flexibility in the way
knowledge is modeled. Moreover, existing KG construction methods provide robust
tooling at the cost of fixed or limited choices among knowledge representation
models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem
for automating the FAIR (Findable, Accessible, Interoperable, and Reusable)
construction of ontologically grounded KGs with fully customizable knowledge
representation. The ecosystem includes KG construction resources (e.g., data
preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction
algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated
the ecosystem by systematically comparing it to existing open-source KG
construction methods and by analyzing its computational performance when used
to construct 12 large-scale KGs. With flexible knowledge representation,
PheKnowLator enables fully customizable KGs without compromising performance or
usability.
Related papers
- Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency [59.6772484292295]
Knowledge graphs (KGs) generated by large language models (LLMs) are increasingly valuable for Retrieval-Augmented Generation (RAG) applications.
Existing KG extraction methods rely on prompt-based approaches, which are inefficient for processing large-scale corpora.
We propose SynthKG, a multi-step, document-level synthesis KG workflow based on LLMs.
We also design a novel graph-based retrieval framework for RAG.
arXiv Detail & Related papers (2024-10-22T00:47:54Z) - ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models [47.27645876623092]
We present ConvKGYarn, a scalable method for generating up-to-date and conversational KGQA datasets.
We showcase its utility by testing LLMs on diverse conversations - exploring model behavior on conversational KGQA sets with different configurations grounded in the same KG fact set.
arXiv Detail & Related papers (2024-08-12T06:48:43Z) - KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge [63.19837262782962]
Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph.
This study introduces KG-FIT, which builds a semantically coherent hierarchical structure of entity clusters.
Experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods.
arXiv Detail & Related papers (2024-05-26T03:04:26Z) - Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph [1.7418328181959968]
The proposed research aims to develop an innovative semantic query processing system.
It enables users to obtain comprehensive information about research works produced by Computer Science (CS) researchers at the Australian National University.
arXiv Detail & Related papers (2024-05-24T09:19:45Z) - From human experts to machines: An LLM supported approach to ontology
and knowledge graph construction [0.0]
Large Language Models (LLMs) have recently gained popularity for their ability to understand and generate human-like natural language.
This work explores the (semi-)automatic construction of KGs facilitated by open-source LLMs.
arXiv Detail & Related papers (2024-03-13T08:50:15Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - KG-Hub -- Building and Exchanging Biological Knowledge Graphs [0.5369297590461578]
KG-Hub is a platform that enables standardized construction, exchange, and reuse of knowledge graphs.
Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research.
arXiv Detail & Related papers (2023-01-31T21:29:35Z) - BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from
Pretrained Language Models [65.51390418485207]
We propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs.
With minimal input of a relation definition, the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge.
We deploy the approach to harvest KGs of over 400 new relations from different LMs.
arXiv Detail & Related papers (2022-06-28T19:46:29Z) - Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the
Federated Setting [43.85991094675398]
We study the knowledge extrapolation problem to embed new components (i.e., entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting.
In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations.
We introduce the meta-learning setting, where a set of tasks are sampled on the existing KG to mimic the link prediction task on the emerging KG.
Based on sampled tasks, we meta-train a graph neural network framework that can construct features for unseen components based on structural information and output embeddings for them.
arXiv Detail & Related papers (2022-05-10T06:27:32Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis [9.141014703209494]
KGTK is a data science-centric toolkit designed to represent, create, transform, enhance and analyze KGs.
We illustrate the framework with real-world scenarios where we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet.
arXiv Detail & Related papers (2020-05-29T21:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.