Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions
- URL: http://arxiv.org/abs/2402.00077v2
- Date: Tue, 29 Oct 2024 18:48:56 GMT
- Title: Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions
- Authors: Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas,
- Abstract summary: We introduce the Bridge model to derive integrated features to preserve information beyond common genes.
The model consistently excels in predicting patient survival across six cancer types in GENIE BPC data.
- Score: 3.5489676012585236
- License:
- Abstract: Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by the American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging such multi-institutional sequencing data presents significant challenges. Variations in gene panels result in loss of information when the analysis is conducted on common gene sets. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess the model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.
Related papers
- Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - Multi-Omic and Quantum Machine Learning Integration for Lung Subtypes Classification [0.0]
The fusion of quantum computing and machine learning holds promise for unraveling complex patterns within multi-omics datasets.
We developed a method for finding the best differentiating features between LUAD and LUSC datasets, which has the potential for biomarker discovery.
arXiv Detail & Related papers (2024-10-02T23:16:31Z) - Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes [0.0]
PARADIGM is a framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction.
We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data.
Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.
arXiv Detail & Related papers (2024-06-11T22:19:14Z) - Integrate Any Omics: Towards genome-wide data integration for patient
stratification [6.893309898200498]
IntegrAO is an unsupervised framework for integrating incomplete multi-omics data and classifying new samples.
IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology.
arXiv Detail & Related papers (2024-01-15T19:57:07Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival
Prediction [7.133948707208067]
We propose PONET- a novel biological pathway-informed pathology-genomic deep model.
Our proposed method achieves superior predictive performance and reveals meaningful biological interpretations.
arXiv Detail & Related papers (2023-01-06T05:24:41Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.