Related papers: Natural Language Processing for Drug Discovery Knowledge Graphs: promises and pitfalls

Natural Language Processing for Drug Discovery Knowledge Graphs: promises and pitfalls

URL: http://arxiv.org/abs/2310.15572v1
Date: Tue, 24 Oct 2023 07:35:24 GMT
Title: Natural Language Processing for Drug Discovery Knowledge Graphs: promises and pitfalls
Authors: J. Charles G. Jeynes, Tim James, Matthew Corney
Abstract summary: Building and analysing knowledge graphs (KGs) to aid drug discovery is a topical area of research. We discuss promises and pitfalls of using natural language processing (NLP) to mine unstructured text as a data source for KGs.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building and analysing knowledge graphs (KGs) to aid drug discovery is a topical area of research. A salient feature of KGs is their ability to combine many heterogeneous data sources in a format that facilitates discovering connections. The utility of KGs has been exemplified in areas such as drug repurposing, with insights made through manual exploration and modelling of the data. In this article, we discuss promises and pitfalls of using natural language processing (NLP) to mine unstructured text typically from scientific literature as a data source for KGs. This draws on our experience of initially parsing structured data sources such as ChEMBL as the basis for data within a KG, and then enriching or expanding upon them using NLP. The fundamental promise of NLP for KGs is the automated extraction of data from millions of documents a task practically impossible to do via human curation alone. However, there are many potential pitfalls in NLP-KG pipelines such as incorrect named entity recognition and ontology linking all of which could ultimately lead to erroneous inferences and conclusions.

Related papers

Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base. We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability. Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z)
Can LLMs be Good Graph Judger for Knowledge Graph Construction? [33.958327252291]
In this paper, we propose GraphJudger, a knowledge graph construction framework to address the aforementioned challenges. We introduce three innovative modules in our method, which are entity-centric iterative text denoising, knowledge aware instruction tuning and graph judgement. Experiments conducted on two general text-graph pair datasets and one domain-specific text-graph pair dataset show superior performances compared to baseline methods.
arXiv Detail & Related papers (2024-11-26T12:46:57Z)
Ontology Population using LLMs [0.9894420655516563]
Knowledge graphs (KGs) are increasingly utilized for data integration, representation, and visualization. LLMs offer promising capabilities for such tasks, excelling in natural language understanding and content generation. This study investigates LLM effectiveness for the KG population, focusing on the Enslaved.org Hub Ontology.
arXiv Detail & Related papers (2024-11-03T15:39:20Z)
Knowledge Graph-Enhanced Large Language Models via Path Selection [58.228392005755026]
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. LLMs are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. We propose a principled framework KELP with three stages to handle the above problems.
arXiv Detail & Related papers (2024-06-19T21:45:20Z)
Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning [4.281723404774889]
We propose an analogy-based approach that starts from seed entities of interest in a generic KG, and keeps or prunes their neighboring entities. We evaluate our approach on Wikidata through two manually labeled datasets that contain either domain-homogeneous or -heterogeneous seed entities.
arXiv Detail & Related papers (2023-06-28T15:17:59Z)
Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale. Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z)
BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models [65.51390418485207]
We propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs. With minimal input of a relation definition, the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge. We deploy the approach to harvest KGs of over 400 new relations from different LMs.
arXiv Detail & Related papers (2022-06-28T19:46:29Z)
Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z)
Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training [22.534866015730664]
We verbalize the entire English Wikidata KG. We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
arXiv Detail & Related papers (2020-10-23T22:14:50Z)
Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text. Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.