SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data
for Cancer Type Classification
- URL: http://arxiv.org/abs/2202.01672v1
- Date: Thu, 3 Feb 2022 16:39:09 GMT
- Title: SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data
for Cancer Type Classification
- Authors: Sayed Hashim, Muhammad Ali, Karthik Nandakumar, Mohammad Yaqub
- Abstract summary: Integration and analysis of multi-omics data give us a broad view of tumours, which can improve clinical decision making.
SubOmiEmbed produces comparable results to the baseline OmiEmbed with a much smaller network and by using just a subset of the data.
This work can be improved to integrate mutation-based genomic data as well.
- Score: 4.992154875028543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For personalized medicines, very crucial intrinsic information is present in
high dimensional omics data which is difficult to capture due to the large
number of molecular features and small number of available samples. Different
types of omics data show various aspects of samples. Integration and analysis
of multi-omics data give us a broad view of tumours, which can improve clinical
decision making. Omics data, mainly DNA methylation and gene expression
profiles are usually high dimensional data with a lot of molecular features. In
recent years, variational autoencoders (VAE) have been extensively used in
embedding image and text data into lower dimensional latent spaces. In our
project, we extend the idea of using a VAE model for low dimensional latent
space extraction with the self-supervised learning technique of feature
subsetting. With VAEs, the key idea is to make the model learn meaningful
representations from different types of omics data, which could then be used
for downstream tasks such as cancer type classification. The main goals are to
overcome the curse of dimensionality and integrate methylation and expression
data to combine information about different aspects of same tissue samples, and
hopefully extract biologically relevant features. Our extension involves
training encoder and decoder to reconstruct the data from just a subset of it.
By doing this, we force the model to encode most important information in the
latent representation. We also added an identity to the subsets so that the
model knows which subset is being fed into it during training and testing. We
experimented with our approach and found that SubOmiEmbed produces comparable
results to the baseline OmiEmbed with a much smaller network and by using just
a subset of the data. This work can be improved to integrate mutation-based
genomic data as well.
Related papers
- Multi-Domain Data Aggregation for Axon and Myelin Segmentation in Histology Images [0.5825410941577593]
Quantifying axon and myelin properties in histology images can provide useful information about microstructural changes caused by neurodegenerative diseases.
Advances in deep learning have made this task quick and reliable with minimal overhead, but a deep learning model trained by one research group will hardly ever be usable by other groups.
There is a pressing need to make AI accessible to researchers to facilitate and accelerate their workflow, but publicly available models are scarce and poorly maintained.
Our approach is to aggregate data from multiple imaging modalities to create an open-source, durable tool for axon and myelin segmentation.
arXiv Detail & Related papers (2024-09-17T20:47:32Z) - An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification [2.2940141855172036]
In molecular biology, there has been an explosion of data generated from multi-omics sequencing.
Traditional statistical methods face challenging tasks when dealing with such high dimensional data.
This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features.
arXiv Detail & Related papers (2024-05-16T01:45:55Z) - Data-Efficient Learning via Minimizing Hyperspherical Energy [48.47217827782576]
This paper considers the problem of data-efficient learning from scratch using a small amount of representative data.
We propose a MHE-based active learning (MHEAL) algorithm, and provide comprehensive theoretical guarantees for MHEAL.
arXiv Detail & Related papers (2022-06-30T11:39:12Z) - Relational Subsets Knowledge Distillation for Long-tailed Retinal
Diseases Recognition [65.77962788209103]
We propose class subset learning by dividing the long-tailed data into multiple class subsets according to prior knowledge.
It enforces the model to focus on learning the subset-specific knowledge.
The proposed framework proved to be effective for the long-tailed retinal diseases recognition task.
arXiv Detail & Related papers (2021-04-22T13:39:33Z) - OmiEmbed: reconstruct comprehensive phenotypic information from
multi-omics data using multi-task deep learning [19.889861433855053]
High-dimensional omics data contains intrinsic biomedical information crucial for personalised medicine.
It is challenging to capture them from genome-wide data due to the large number of molecular features and small number of available samples.
We proposed a unified multi-task deep learning framework called OmiEmbed to capture a holistic and relatively precise profile of phenotype from high-dimensional omics data.
arXiv Detail & Related papers (2021-02-03T07:34:29Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's
Disease Classification of Gait Patterns [3.5939555573102857]
We show how to extract features relevant to accelerometer gait data for Parkinson's disease classification.
Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model.
We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.
arXiv Detail & Related papers (2020-05-06T04:08:19Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.