A deep learning pipeline for cross-sectional and longitudinal multiview
data integration
- URL: http://arxiv.org/abs/2312.01238v1
- Date: Sat, 2 Dec 2023 22:24:35 GMT
- Title: A deep learning pipeline for cross-sectional and longitudinal multiview
data integration
- Authors: Sarthak Jain and Sandra E. Safo
- Abstract summary: We have developed a pipeline to integrate cross-sectional and longitudinal data from multiple sources.
It includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks and recurrent neural networks.
We applied this pipeline to cross-sectional and longitudinal multi-omics data (metagenomics, transcriptomics, and metabolomics) from an inflammatory bowel disease (IBD) study and we identified microbial pathways, metabolites, and genes that discriminate by IBD status.
- Score: 7.424942475653412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biomedical research now commonly integrates diverse data types or views from
the same individuals to better understand the pathobiology of complex diseases,
but the challenge lies in meaningfully integrating these diverse views.
Existing methods often require the same type of data from all views
(cross-sectional data only or longitudinal data only) or do not consider any
class outcome in the integration method, presenting limitations. To overcome
these limitations, we have developed a pipeline that harnesses the power of
statistical and deep learning methods to integrate cross-sectional and
longitudinal data from multiple sources. Additionally, it identifies key
variables contributing to the association between views and the separation
among classes, providing deeper biological insights. This pipeline includes
variable selection/ranking using linear and nonlinear methods, feature
extraction using functional principal component analysis and Euler
characteristics, and joint integration and classification using dense
feed-forward networks and recurrent neural networks. We applied this pipeline
to cross-sectional and longitudinal multi-omics data (metagenomics,
transcriptomics, and metabolomics) from an inflammatory bowel disease (IBD)
study and we identified microbial pathways, metabolites, and genes that
discriminate by IBD status, providing information on the etiology of IBD. We
conducted simulations to compare the two feature extraction methods. The
proposed pipeline is available from the following GitHub repository:
https://github.com/lasandrall/DeepIDA-GRU.
Related papers
- FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information.
Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis.
The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features.
We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - GeoTop: Advancing Image Classification with Geometric-Topological
Analysis [0.0]
Topological Data Analysis and Lipschitz-Killing Curvatures are used as powerful tools for feature extraction and classification.
We investigate the potential of combining both methods to improve classification accuracy.
This approach has the potential to advance our understanding of complex biological processes in various biomedical applications.
arXiv Detail & Related papers (2023-11-08T23:38:32Z) - CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self
Attention for multi-omics integration with incomplete multi-omics data [47.2764293508916]
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data.
One obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost.
We propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention.
arXiv Detail & Related papers (2023-04-12T00:22:18Z) - Interpretable Deep Learning Methods for Multiview Learning [7.369639553849422]
iDeepViewLearn is a method for learning nonlinear relationships in data from multiple views.
Deep neural networks are used to learn view-independent low-dimensional embedding.
iDeepViewLearn is tested on simulated and two real-world data, including breast cancer-related gene expression and methylation data.
arXiv Detail & Related papers (2023-02-15T20:11:25Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Pipeline for Integrated Theory and Data-Driven Modeling of Genomic and
Clinical Data [5.921993992338802]
We propose a pipeline for knowledge discovery from integrated genomic and clinical data.
We demonstrate how this pipeline can improve breast cancer outcome prediction models, and can provide a biologically interpretable view of sequencing data.
arXiv Detail & Related papers (2020-05-05T22:23:27Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.