A Pipeline for Integrated Theory and Data-Driven Modeling of Genomic and
Clinical Data
- URL: http://arxiv.org/abs/2005.02521v1
- Date: Tue, 5 May 2020 22:23:27 GMT
- Title: A Pipeline for Integrated Theory and Data-Driven Modeling of Genomic and
Clinical Data
- Authors: Vineet K Raghu, Xiaoyu Ge, Arun Balajee, Daniel J. Shirer, Isha Das,
Panayiotis V. Benos, and Panos K. Chrysanthis
- Abstract summary: We propose a pipeline for knowledge discovery from integrated genomic and clinical data.
We demonstrate how this pipeline can improve breast cancer outcome prediction models, and can provide a biologically interpretable view of sequencing data.
- Score: 5.921993992338802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High throughput genome sequencing technologies such as RNA-Seq and Microarray
have the potential to transform clinical decision making and biomedical
research by enabling high-throughput measurements of the genome at a granular
level. However, to truly understand causes of disease and the effects of
medical interventions, this data must be integrated with phenotypic,
environmental, and behavioral data from individuals. Further, effective
knowledge discovery methods that can infer relationships between these data
types are required. In this work, we propose a pipeline for knowledge discovery
from integrated genomic and clinical data. The pipeline begins with a novel
variable selection method, and uses a probabilistic graphical model to
understand the relationships between features in the data. We demonstrate how
this pipeline can improve breast cancer outcome prediction models, and can
provide a biologically interpretable view of sequencing data.
Related papers
- Simplicity within biological complexity [0.0]
We survey the literature and argue for the development of a comprehensive framework for embedding of multi-scale molecular network data.
Network embedding methods map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships.
We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation.
arXiv Detail & Related papers (2024-05-15T13:32:45Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Unlocking the Power of Multi-institutional Data: Integrating and Harmonizing Genomic Data Across Institutions [3.5489676012585236]
We introduce the Bridge model to derive integrated features to preserve information beyond common genes.
The model consistently excels in predicting patient survival across six cancer types in GENIE BPC data.
arXiv Detail & Related papers (2024-01-30T23:25:05Z) - A deep learning pipeline for cross-sectional and longitudinal multiview
data integration [7.424942475653412]
We have developed a pipeline to integrate cross-sectional and longitudinal data from multiple sources.
It includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks and recurrent neural networks.
We applied this pipeline to cross-sectional and longitudinal multi-omics data (metagenomics, transcriptomics, and metabolomics) from an inflammatory bowel disease (IBD) study and we identified microbial pathways, metabolites, and genes that discriminate by IBD status.
arXiv Detail & Related papers (2023-12-02T22:24:35Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival
Prediction [7.133948707208067]
We propose PONET- a novel biological pathway-informed pathology-genomic deep model.
Our proposed method achieves superior predictive performance and reveals meaningful biological interpretations.
arXiv Detail & Related papers (2023-01-06T05:24:41Z) - Graph Neural Networks for Breast Cancer Data Integration [0.0]
We propose a novel learning pipeline comprising three steps - the integration of cancer data modalities as graphs, followed by the application of Graph Neural Networks.
This project has the potential to improve cancer data understanding and encourages the transition of regular data sets to graph-shaped data.
arXiv Detail & Related papers (2022-11-28T17:10:19Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.