Implications of Topological Imbalance for Representation Learning on
Biomedical Knowledge Graphs
- URL: http://arxiv.org/abs/2112.06567v1
- Date: Mon, 13 Dec 2021 11:20:36 GMT
- Title: Implications of Topological Imbalance for Representation Learning on
Biomedical Knowledge Graphs
- Authors: Stephen Bonner, Ufuk Kirik, Ola Engkvist, Jian Tang, Ian P Barrett
- Abstract summary: We show how knowledge graph embedding models can be affected by structural imbalance.
We show how the graph topology can be perturbed to artificially alter the rank of a gene via random, biologically meaningless information.
- Score: 16.566710222582618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving on the standard of care for diseases is predicated on better
treatments, which in turn relies on finding and developing new drugs. However,
drug discovery is a complex and costly process. Adoption of methods from
machine learning has given rise to creation of drug discovery knowledge graphs
which utilize the inherent interconnected nature of the domain. Graph-based
data modelling, combined with knowledge graph embeddings provide a more
intuitive representation of the domain and are suitable for inference tasks
such as predicting missing links. One such example would be producing ranked
lists of likely associated genes for a given disease, often referred to as
target discovery. It is thus critical that these predictions are not only
pertinent but also biologically meaningful. However, knowledge graphs can be
biased either directly due to the underlying data sources that are integrated
or due to modeling choices in the construction of the graph, one consequence of
which is that certain entities can get topologically overrepresented. We show
how knowledge graph embedding models can be affected by this structural
imbalance, resulting in densely connected entities being highly ranked no
matter the context. We provide support for this observation across different
datasets, models and predictive tasks. Further, we show how the graph topology
can be perturbed to artificially alter the rank of a gene via random,
biologically meaningless information. This suggests that such models can be
more influenced by the frequency of entities rather than biological information
encoded in the relations, creating issues when entity frequency is not a true
reflection of underlying data. Our results highlight the importance of data
modeling choices and emphasizes the need for practitioners to be mindful of
these issues when interpreting model outputs and during knowledge graph
composition.
Related papers
- Learning to refine domain knowledge for biological network inference [2.209921757303168]
Perturbation experiments allow biologists to discover causal relationships between variables of interest.
The sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms.
We propose an amortized algorithm for refining domain knowledge, based on data observations.
arXiv Detail & Related papers (2024-10-18T12:53:23Z) - Tertiary Lymphoid Structures Generation through Graph-based Diffusion [54.37503714313661]
In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs.
We show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content.
arXiv Detail & Related papers (2023-10-10T14:37:17Z) - A Comparative Study of Population-Graph Construction Methods and Graph
Neural Networks for Brain Age Regression [48.97251676778599]
In medical imaging, population graphs have demonstrated promising results, mostly for classification tasks.
extracting population graphs is a non-trivial task and can significantly impact the performance of Graph Neural Networks (GNNs)
In this work, we highlight the importance of a meaningful graph construction and experiment with different population-graph construction methods.
arXiv Detail & Related papers (2023-09-26T10:30:45Z) - Probing Graph Representations [77.7361299039905]
We use a probing framework to quantify the amount of meaningful information captured in graph representations.
Our findings on molecular datasets show the potential of probing for understanding the inductive biases of graph-based models.
We advocate for probing as a useful diagnostic tool for evaluating graph-based models.
arXiv Detail & Related papers (2023-03-07T14:58:18Z) - Analysis of Drug repurposing Knowledge graphs for Covid-19 [0.0]
This study proposes a set of candidate drugs for COVID-19 using Drug repurposing knowledge graph (DRKG)
DRKG is a biological knowledge graph constructed using a vast amount of open source biomedical knowledge.
nodes and relation embeddings are learned using knowledge graph embedding models and neural network and attention related models.
arXiv Detail & Related papers (2022-12-07T19:14:17Z) - Graph-in-Graph (GiG): Learning interpretable latent graphs in
non-Euclidean domain for biological and healthcare applications [52.65389473899139]
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain.
Recent works have shown that considering relationships between input data samples have a positive regularizing effect for the downstream task.
We propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications.
arXiv Detail & Related papers (2022-04-01T10:01:37Z) - Biomedical Knowledge Graph Refinement and Completion using Graph
Representation Learning and Top-K Similarity Measure [1.4660617536303606]
This work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD.
We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors.
arXiv Detail & Related papers (2020-12-18T22:19:57Z) - Out-of-Sample Representation Learning for Multi-Relational Graphs [8.956321788625894]
We study the out-of-sample representation learning problem for non-attributed knowledge graphs.
We create benchmark datasets for this task, develop several models and baselines, and provide empirical analyses and comparisons of the proposed models and baselines.
arXiv Detail & Related papers (2020-04-28T00:53:01Z) - A Heterogeneous Graph with Factual, Temporal and Logical Knowledge for
Question Answering Over Dynamic Contexts [81.4757750425247]
We study question answering over a dynamic textual environment.
We develop a graph neural network over the constructed graph, and train the model in an end-to-end manner.
arXiv Detail & Related papers (2020-04-25T04:53:54Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.