Graph Neural Networks for Breast Cancer Data Integration
- URL: http://arxiv.org/abs/2211.15561v1
- Date: Mon, 28 Nov 2022 17:10:19 GMT
- Title: Graph Neural Networks for Breast Cancer Data Integration
- Authors: Teodora Reu
- Abstract summary: We propose a novel learning pipeline comprising three steps - the integration of cancer data modalities as graphs, followed by the application of Graph Neural Networks.
This project has the potential to improve cancer data understanding and encourages the transition of regular data sets to graph-shaped data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: International initiatives such as METABRIC (Molecular Taxonomy of Breast
Cancer International Consortium) have collected several multigenomic and
clinical data sets to identify the undergoing molecular processes taking place
throughout the evolution of various cancers. Numerous Machine Learning and
statistical models have been designed and trained to analyze these types of
data independently, however, the integration of such differently shaped and
sourced information streams has not been extensively studied. To better
integrate these data sets and generate meaningful representations that can
ultimately be leveraged for cancer detection tasks could lead to giving
well-suited treatments to patients. Hence, we propose a novel learning pipeline
comprising three steps - the integration of cancer data modalities as graphs,
followed by the application of Graph Neural Networks in an unsupervised setting
to generate lower-dimensional embeddings from the combined data, and finally
feeding the new representations on a cancer sub-type classification model for
evaluation. The graph construction algorithms are described in-depth as
METABRIC does not store relationships between the patient modalities, with a
discussion of their influence over the quality of the generated embeddings. We
also present the models used to generate the lower-latent space
representations: Graph Neural Networks, Variational Graph Autoencoders and Deep
Graph Infomax. In parallel, the pipeline is tested on a synthetic dataset to
demonstrate that the characteristics of the underlying data, such as homophily
levels, greatly influence the performance of the pipeline, which ranges between
51\% to 98\% accuracy on artificial data, and 13\% and 80\% on METABRIC. This
project has the potential to improve cancer data understanding and encourages
the transition of regular data sets to graph-shaped data.
Related papers
- Comparative Analysis of Multi-Omics Integration Using Advanced Graph Neural Networks for Cancer Classification [40.45049709820343]
Multi-omics data integration poses significant challenges due to the high dimensionality, data complexity, and distinct characteristics of various omics types.
This study evaluates three graph neural network architectures for multi-omics (MO) integration based on graph-convolutional networks (GCN), graph-attention networks (GAT), and graph-transformer networks (GTN)
arXiv Detail & Related papers (2024-10-05T16:17:44Z) - Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes [0.0]
PARADIGM is a framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction.
We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data.
Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.
arXiv Detail & Related papers (2024-06-11T22:19:14Z) - SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival [8.403756148610269]
Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach.
This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders.
Our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases.
arXiv Detail & Related papers (2024-03-14T11:23:39Z) - An end-to-end framework for gene expression classification by
integrating a background knowledge graph: application to cancer prognosis
prediction [1.5484595752241122]
We proposed an end-to-end framework to handle secondary data to construct a classification model for primary data.
We applied this framework to cancer prognosis prediction using gene expression data and a biological network.
arXiv Detail & Related papers (2023-06-29T11:20:47Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - A Pipeline for Integrated Theory and Data-Driven Modeling of Genomic and
Clinical Data [5.921993992338802]
We propose a pipeline for knowledge discovery from integrated genomic and clinical data.
We demonstrate how this pipeline can improve breast cancer outcome prediction models, and can provide a biologically interpretable view of sequencing data.
arXiv Detail & Related papers (2020-05-05T22:23:27Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.