Integrate Any Omics: Towards genome-wide data integration for patient
stratification
- URL: http://arxiv.org/abs/2401.07937v1
- Date: Mon, 15 Jan 2024 19:57:07 GMT
- Title: Integrate Any Omics: Towards genome-wide data integration for patient
stratification
- Authors: Shihao Ma, Andy G.X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John
E Dick and Bo Wang
- Abstract summary: IntegrAO is an unsupervised framework for integrating incomplete multi-omics data and classifying new samples.
IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology.
- Score: 6.893309898200498
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: High-throughput omics profiling advancements have greatly enhanced cancer
patient stratification. However, incomplete data in multi-omics integration
presents a significant challenge, as traditional methods like sample exclusion
or imputation often compromise biological diversity and dependencies.
Furthermore, the critical task of accurately classifying new patients with
partial omics data into existing subtypes is commonly overlooked. To address
these issues, we introduce IntegrAO (Integrate Any Omics), an unsupervised
framework for integrating incomplete multi-omics data and classifying new
samples. IntegrAO first combines partially overlapping patient graphs from
diverse omics sources and utilizes graph neural networks to produce unified
patient embeddings. Our systematic evaluation across five cancer cohorts
involving six omics modalities demonstrates IntegrAO's robustness to missing
data and its accuracy in classifying new samples with partial profiles. An
acute myeloid leukemia case study further validates its capability to uncover
biological and clinical heterogeneity in incomplete datasets. IntegrAO's
ability to handle heterogeneous and incomplete data makes it an essential tool
for precision oncology, offering a holistic approach to patient
characterization.
Related papers
- Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Heterogeneous graph attention network improves cancer multiomics integration [8.729516996214537]
We introduce a Heterogeneous Graph ATtention network for omics integration (HeteroGATomics) to improve cancer diagnosis.
Experiments on three cancer multiomics datasets demonstrate HeteroGATomics' superior performance in cancer diagnosis.
arXiv Detail & Related papers (2024-08-05T22:01:13Z) - Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes [0.0]
PARADIGM is a framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction.
We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data.
Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.
arXiv Detail & Related papers (2024-06-11T22:19:14Z) - IGCN: Integrative Graph Convolution Networks for patient level insights and biomarker discovery in multi-omics integration [2.0971479389679337]
We introduce a novel integrative neural network approach for cancer molecular subtype and biomedical classification applications.
IGCN can identify which types of omics receive more emphasis for each patient to predict a certain class.
IGCN has the capability to pinpoint significant biomarkers from a range of omics data types.
arXiv Detail & Related papers (2024-01-31T05:52:11Z) - Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions [3.5489676012585236]
We introduce the Bridge model to derive integrated features to preserve information beyond common genes.
The model consistently excels in predicting patient survival across six cancer types in GENIE BPC data.
arXiv Detail & Related papers (2024-01-30T23:25:05Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self
Attention for multi-omics integration with incomplete multi-omics data [47.2764293508916]
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data.
One obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost.
We propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention.
arXiv Detail & Related papers (2023-04-12T00:22:18Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.