Understanding the Performance of Knowledge Graph Embeddings in Drug
Discovery
- URL: http://arxiv.org/abs/2105.10488v1
- Date: Mon, 17 May 2021 11:39:54 GMT
- Title: Understanding the Performance of Knowledge Graph Embeddings in Drug
Discovery
- Authors: Stephen Bonner and Ian P Barrett and Cheng Ye and Rowan Swiers and Ola
Engkvist and William L Hamilton
- Abstract summary: Knowledge Graphs (KGs) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery.
In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs.
Our results highlight that these factors have significant impact on performance and can even affect the ranking of models.
- Score: 14.839673015887275
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models
have recently begun to be explored in the context of drug discovery and have
the potential to assist in key challenges such as target identification. In the
drug discovery domain, KGs can be employed as part of a process which can
result in lab-based experiments being performed, or impact on other decisions,
incurring significant time and financial costs and most importantly, ultimately
influencing patient healthcare. For KGE models to have impact in this domain, a
better understanding of not only of performance, but also the various factors
which determine it, is required.
In this study we investigate, over the course of many thousands of
experiments, the predictive performance of five KGE models on two public drug
discovery-oriented KGs. Our goal is not to focus on the best overall model or
configuration, instead we take a deeper look at how performance can be affected
by changes in the training setup, choice of hyperparameters, model parameter
initialisation seed and different splits of the datasets. Our results highlight
that these factors have significant impact on performance and can even affect
the ranking of models. Indeed these factors should be reported along with model
architectures to ensure complete reproducibility and fair comparisons of future
work, and we argue this is critical for the acceptance of use, and impact of
KGEs in a biomedical setting. To aid reproducibility of our own work, we
release all experimentation code.
Related papers
- Explaining Human Activity Recognition with SHAP: Validating Insights with Perturbation and Quantitative Measures [0.1398098625978622]
This study uses SHapley Additive exPlanations to explain the decision-making process of Graph Convolution Networks (GCNs)
We employ SHAP to explain two real-world datasets: one for cerebral palsy (CP) classification and the widely used NTU RGB+D 60 action recognition dataset.
Results on both datasets show that body key points identified as important through SHAP have the largest influence on the accuracy, specificity, and sensitivity metrics.
arXiv Detail & Related papers (2024-11-06T07:28:57Z) - Data-Centric Long-Tailed Image Recognition [49.90107582624604]
Long-tail models exhibit a strong demand for high-quality data.
Data-centric approaches aim to enhance both the quantity and quality of data to improve model performance.
There is currently a lack of research into the underlying mechanisms explaining the effectiveness of information augmentation.
arXiv Detail & Related papers (2023-11-03T06:34:37Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Data Augmentation for Electrocardiograms [2.8498944632323755]
We study whether data augmentation methods can be used to improve performance on data-scarce ECG prediction problems.
We introduce a new method, TaskAug, which defines a flexible augmentation policy that is optimized on a per-task basis.
In experiments, we find that TaskAug is competitive with or improves on prior work, and the learned policies shed light on what transformations are most effective for different tasks.
arXiv Detail & Related papers (2022-04-09T02:19:55Z) - Variational Auto-Encoder Architectures that Excel at Causal Inference [26.731576721694648]
Estimating causal effects from observational data is critical for making many types of decisions.
One approach to address this task is to learn decomposed representations of the underlying factors of data.
In this paper, we take a generative approach that builds on the recent advances in Variational Auto-Encoders.
arXiv Detail & Related papers (2021-11-11T22:37:43Z) - An Empirical Study on Neural Keyphrase Generation [32.98420137439619]
Recent years have seen a flourishing of neural keyphrase generation (KPG) works.
Model performance on KPG tasks has increased significantly with evolving deep learning research.
arXiv Detail & Related papers (2020-09-22T00:11:32Z) - Causal Inference using Gaussian Processes with Structured Latent
Confounders [9.8164690355257]
This paper shows how to semiparametrically model latent confounders that have this structure and thereby improve estimates of causal effects.
The key innovations are a hierarchical Bayesian model, Gaussian processes with structured latent confounders (GP-SLC), and a Monte Carlo inference algorithm for this model based on elliptical slice sampling.
arXiv Detail & Related papers (2020-07-14T15:45:28Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.