Cancer Subtyping by Improved Transcriptomic Features Using Vector
Quantized Variational Autoencoder
- URL: http://arxiv.org/abs/2207.09783v1
- Date: Wed, 20 Jul 2022 09:47:53 GMT
- Title: Cancer Subtyping by Improved Transcriptomic Features Using Vector
Quantized Variational Autoencoder
- Authors: Zheng Chen, Ziwei Yang, Lingwei Zhu, Guang Shi, Kun Yue, Takashi
Matsubara, Shigehiko Kanaya, MD Altaf-Ul-Amin
- Abstract summary: We propose Vector Quantized Variational AutoEncoder (VQ-VAE) to tackle the data issues and extract informative latent features that are crucial to the quality of subsequent clustering.
VQ-VAE does not impose strict assumptions and hence its latent features are better representations of the input, capable of yielding superior clustering performance with any mainstream clustering method.
- Score: 10.835673227875615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Defining and separating cancer subtypes is essential for facilitating
personalized therapy modality and prognosis of patients. The definition of
subtypes has been constantly recalibrated as a result of our deepened
understanding. During this recalibration, researchers often rely on clustering
of cancer data to provide an intuitive visual reference that could reveal the
intrinsic characteristics of subtypes. The data being clustered are often omics
data such as transcriptomics that have strong correlations to the underlying
biological mechanism. However, while existing studies have shown promising
results, they suffer from issues associated with omics data: sample scarcity
and high dimensionality. As such, existing methods often impose unrealistic
assumptions to extract useful features from the data while avoiding overfitting
to spurious correlations. In this paper, we propose to leverage a recent strong
generative model, Vector Quantized Variational AutoEncoder (VQ-VAE), to tackle
the data issues and extract informative latent features that are crucial to the
quality of subsequent clustering by retaining only information relevant to
reconstructing the input. VQ-VAE does not impose strict assumptions and hence
its latent features are better representations of the input, capable of
yielding superior clustering performance with any mainstream clustering method.
Extensive experiments and medical analysis on multiple datasets comprising 10
distinct cancers demonstrate the VQ-VAE clustering results can significantly
and robustly improve prognosis over prevalent subtyping systems.
Related papers
- DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data [7.049723871585993]
We propose a model, named DEDUCE, for unsupervised contrastive learning to analyze multi-omics cancer data.
This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data.
Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data.
arXiv Detail & Related papers (2023-07-09T00:53:23Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - A cost-based multi-layer network approach for the discovery of patient
phenotypes [2.816539638885011]
We propose a cost-based layer selector model for detecting phenotypes using a community detection approach.
Our goal is to minimize the number of features used to build these phenotypes while preserving its quality.
For some post-treatment variables, predictors using phenotypes from COBALT as features outperformed those using phenotypes detected by traditional clustering methods.
arXiv Detail & Related papers (2022-09-19T14:07:10Z) - Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics
Data [5.232428469965068]
We propose to investigate automatic subtyping from an unsupervised learning perspective.
Specifically, we bypass the strong Gaussianity assumption that typically exists but fails in the unsupervised learning subtyping literature.
Our proposed method better captures the latent space features and models the cancer subtype manifestation on a molecular basis.
arXiv Detail & Related papers (2022-04-02T11:44:58Z) - Multiple Organ Failure Prediction with Classifier-Guided Generative
Adversarial Imputation Networks [4.040013871160853]
Multiple organ failure (MOF) is a severe syndrome with a high mortality rate among Intensive Care Unit (ICU) patients.
Applying machine learning models to electronic health records is a challenge due to the pervasiveness of missing values.
arXiv Detail & Related papers (2021-06-22T15:49:01Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.