Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction
- URL: http://arxiv.org/abs/2404.14970v1
- Date: Tue, 23 Apr 2024 12:24:53 GMT
- Title: Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction
- Authors: Rita T. Sousa, Heiko Paulheim,
- Abstract summary: We propose a novel approach to integrate multiple gene expression datasets and domain-specific knowledge.
KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier.
- Score: 1.8722948221596285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.
Related papers
- Multi-dataset and Transfer Learning Using Gene Expression Knowledge Graphs [1.8722948221596285]
Gene expression datasets offer insights into gene regulation mechanisms, biochemical pathways, and cellular functions.
Gene expression data can provide valuable insights, but challenges arise because the number of patients in expression datasets is limited.
This work proposes a novel methodology to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge.
arXiv Detail & Related papers (2025-03-26T10:23:27Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Robust Multi-view Co-expression Network Inference [8.697303234009528]
Inferring gene co-expression networks from transcriptome data presents many challenges.
We introduce a robust method for high-dimensional graph inference from multiple independent studies.
arXiv Detail & Related papers (2024-09-30T06:30:09Z) - From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [50.80532910808962]
We present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture.
GluFormer generalizes to 15 different external datasets, including 4936 individuals across 5 different geographical regions.
It can also predict onset of future health outcomes even 4 years in advance.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Label scarcity in biomedicine: Data-rich latent factor discovery
enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics.
Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z) - SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene
Expression [8.326669256957352]
We present SimpleChrome, a deep learning model that learns the histone modification representations of genes.
The features learned from the model allow us to better understand the latent effects of cross-gene interactions and direct gene regulation on the target gene expression.
arXiv Detail & Related papers (2020-12-15T23:30:36Z) - Using ontology embeddings for structural inductive bias in gene
expression data analysis [6.587739898387445]
Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning.
We propose to incorporate biological knowledge about genes into the machine learning system for the task of patient classification given their gene expression data.
arXiv Detail & Related papers (2020-11-22T12:13:29Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Semi-Supervised Generative Adversarial Network for Prediction of
Genetic Disease Outcomes [0.0]
We introduce genetic Generative Adversarial Networks (gGAN) to create large synthetic genetic data sets.
Our goal is to determine the propensity of a new individual to develop the severe form of the illness from their genetic profile alone.
The proposed model is self-aware and capable of determining whether a new genetic profile has enough compatibility with the data on which the network was trained.
arXiv Detail & Related papers (2020-07-02T15:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.