Unlocking the Power of Multi-institutional Data: Integrating and
Harmonizing Genomic Data Across Institutions
- URL: http://arxiv.org/abs/2402.00077v1
- Date: Tue, 30 Jan 2024 23:25:05 GMT
- Title: Unlocking the Power of Multi-institutional Data: Integrating and
Harmonizing Genomic Data Across Institutions
- Authors: Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas
- Abstract summary: We introduce the Bridge model to derive integrated features to preserve information beyond common genes.
The model consistently excels in predicting patient survival across six cancer types in GENIE BPC data.
- Score: 3.8769921482808116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cancer is a complex disease driven by genomic alterations, and tumor
sequencing is becoming a mainstay of clinical care for cancer patients. The
emergence of multi-institution sequencing data presents a powerful resource for
learning real-world evidence to enhance precision oncology. GENIE BPC, led by
the American Association for Cancer Research, establishes a unique database
linking genomic data with clinical information for patients treated at multiple
cancer centers. However, leveraging such multi-institutional sequencing data
presents significant challenges. Variations in gene panels result in loss of
information when the analysis is conducted on common gene sets. Additionally,
differences in sequencing techniques and patient heterogeneity across
institutions add complexity. High data dimensionality, sparse gene mutation
patterns, and weak signals at the individual gene level further complicate
matters. Motivated by these real-world challenges, we introduce the Bridge
model. It uses a quantile-matched latent variable approach to derive integrated
features to preserve information beyond common genes and maximize the
utilization of all available data while leveraging information sharing to
enhance both learning efficiency and the model's capacity to generalize. By
extracting harmonized and noise-reduced lower-dimensional latent variables, the
true mutation pattern unique to each individual is captured. We assess the
model's performance and parameter estimation through extensive simulation
studies. The extracted latent features from the Bridge model consistently excel
in predicting patient survival across six cancer types in GENIE BPC data.
Related papers
- Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes [0.0]
PARADIGM is a framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction.
We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data.
Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.
arXiv Detail & Related papers (2024-06-11T22:19:14Z) - SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival [8.403756148610269]
Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach.
This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders.
Our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases.
arXiv Detail & Related papers (2024-03-14T11:23:39Z) - Integrate Any Omics: Towards genome-wide data integration for patient
stratification [6.893309898200498]
IntegrAO is an unsupervised framework for integrating incomplete multi-omics data and classifying new samples.
IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology.
arXiv Detail & Related papers (2024-01-15T19:57:07Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival
Prediction [7.133948707208067]
We propose PONET- a novel biological pathway-informed pathology-genomic deep model.
Our proposed method achieves superior predictive performance and reveals meaningful biological interpretations.
arXiv Detail & Related papers (2023-01-06T05:24:41Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.