An Open-Source Knowledge Graph Ecosystem for the Life Sciences
- URL: http://arxiv.org/abs/2307.05727v2
- Date: Tue, 30 Jan 2024 07:27:32 GMT
- Title: An Open-Source Knowledge Graph Ecosystem for the Life Sciences
- Authors: Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca
Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A.
Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt,
Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter
N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana,
Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo,
Nicole A. Vasilevsky, Robert Hoehndorf, Tellen D. Bennett, Patrick B. Ryan,
George Hripcsak, Michael G. Kahn, Michael Bada, William A. Baumgartner Jr,
Lawrence E. Hunter
- Abstract summary: PheKnowLator is a semantic ecosystem for automating the construction of ontologically grounded knowledge graphs.
The ecosystem includes KG construction resources, analysis tools, and benchmarks.
PheKnowLator enables fully customizable KGs without compromising performance or usability.
- Score: 5.665519167428707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translational research requires data at multiple scales of biological
organization. Advancements in sequencing and multi-omics technologies have
increased the availability of these data, but researchers face significant
integration challenges. Knowledge graphs (KGs) are used to model complex
phenomena, and methods exist to construct them automatically. However, tackling
complex biomedical integration problems requires flexibility in the way
knowledge is modeled. Moreover, existing KG construction methods provide robust
tooling at the cost of fixed or limited choices among knowledge representation
models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem
for automating the FAIR (Findable, Accessible, Interoperable, and Reusable)
construction of ontologically grounded KGs with fully customizable knowledge
representation. The ecosystem includes KG construction resources (e.g., data
preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction
algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated
the ecosystem by systematically comparing it to existing open-source KG
construction methods and by analyzing its computational performance when used
to construct 12 large-scale KGs. With flexible knowledge representation,
PheKnowLator enables fully customizable KGs without compromising performance or
usability.
Related papers
- Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base.
We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability.
Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z) - Automated Extraction and Creation of FBS Design Reasoning Knowledge Graphs from Structured Data in Product Catalogues Lacking Contextual Information [0.10840985826142427]
Ontology-based knowledge graphs (KG) are desirable for effective knowledge management and reuse in various decision making scenarios.
Most research on automated extraction and creation of KG is based on extensive unstructured data sets.
This research reports a method and digital workflow developed to address this gap.
arXiv Detail & Related papers (2024-12-08T09:20:25Z) - Leveraging LLM for Automated Ontology Extraction and Knowledge Graph Generation [3.2513035377783717]
OntoKGen is a genuine pipeline for ontology extraction and Knowledge Graph generation.
OntoKGen enables seamless integration into schemeless, non-relational databases like Neo4j.
arXiv Detail & Related papers (2024-11-30T23:11:44Z) - Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency [59.6772484292295]
Knowledge graphs (KGs) generated by large language models (LLMs) are increasingly valuable for Retrieval-Augmented Generation (RAG) applications.
Existing KG extraction methods rely on prompt-based approaches, which are inefficient for processing large-scale corpora.
We propose SynthKG, a multi-step, document-level synthesis KG workflow based on LLMs.
We also design a novel graph-based retrieval framework for RAG.
arXiv Detail & Related papers (2024-10-22T00:47:54Z) - ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models [47.27645876623092]
We present ConvKGYarn, a scalable method for generating up-to-date and conversational KGQA datasets.
We showcase its utility by testing LLMs on diverse conversations - exploring model behavior on conversational KGQA sets with different configurations grounded in the same KG fact set.
arXiv Detail & Related papers (2024-08-12T06:48:43Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - KG-Hub -- Building and Exchanging Biological Knowledge Graphs [0.5369297590461578]
KG-Hub is a platform that enables standardized construction, exchange, and reuse of knowledge graphs.
Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research.
arXiv Detail & Related papers (2023-01-31T21:29:35Z) - BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from
Pretrained Language Models [65.51390418485207]
We propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs.
With minimal input of a relation definition, the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge.
We deploy the approach to harvest KGs of over 400 new relations from different LMs.
arXiv Detail & Related papers (2022-06-28T19:46:29Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis [9.141014703209494]
KGTK is a data science-centric toolkit designed to represent, create, transform, enhance and analyze KGs.
We illustrate the framework with real-world scenarios where we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet.
arXiv Detail & Related papers (2020-05-29T21:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.