Optimize Deep Learning Models for Prediction of Gene Mutations Using
Unsupervised Clustering
- URL: http://arxiv.org/abs/2204.01593v1
- Date: Thu, 31 Mar 2022 11:48:21 GMT
- Title: Optimize Deep Learning Models for Prediction of Gene Mutations Using
Unsupervised Clustering
- Authors: Zihan Chen, Xingyu Li, Miaomiao Yang, Hong Zhang, Xu Steven Xu
- Abstract summary: Deep learning has become the mainstream methodological choice for analyzing and interpreting whole-slide digital pathology images.
In this paper, we proposed an unsupervised clustering-based multiple-instance learning, and apply our method to develop deep-learning models for prediction of gene mutations using WSIs from three cancer types.
We showed that unsupervised clustering of image patches could help identify predictive patches, exclude patches lack of predictive information, and therefore improve prediction on gene mutations in all three different cancer types.
- Score: 6.494144125433731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has become the mainstream methodological choice for analyzing
and interpreting whole-slide digital pathology images (WSIs). It is commonly
assumed that tumor regions carry most predictive information. In this paper, we
proposed an unsupervised clustering-based multiple-instance learning, and apply
our method to develop deep-learning models for prediction of gene mutations
using WSIs from three cancer types in The Cancer Genome Atlas (TCGA) studies
(CRC, LUAD, and HNSCC). We showed that unsupervised clustering of image patches
could help identify predictive patches, exclude patches lack of predictive
information, and therefore improve prediction on gene mutations in all three
different cancer types, compared with the WSI based method without selection of
image patches and models based on only tumor regions. Additionally, our
proposed algorithm outperformed two recently published baseline algorithms
leveraging unsupervised clustering to assist model prediction. The
unsupervised-clustering-based approach for mutation prediction allows
identification of the spatial regions related to mutation of a specific gene
via the resolved probability scores, highlighting the heterogeneity of a
predicted genotype in the tumor microenvironment. Finally, our study also
demonstrated that selection of tumor regions of WSIs is not always the best way
to identify patches for prediction of gene mutations, and other tissue types in
the tumor micro-environment may provide better prediction ability for gene
mutations than tumor tissues.
Related papers
- Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner [20.07173196364489]
This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data.
Experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types.
arXiv Detail & Related papers (2024-04-06T08:07:48Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Multi-modal learning for predicting the genotype of glioma [14.93152817415408]
The isocitrate dehydrogenase (IDH) gene mutation is an essential biomarker for the diagnosis and prognosis of glioma.
It is promising to better predict glioma genotype by integrating focal tumor image and geometric features with brain network features derived from MRI.
We propose a multi-modal learning framework using three separate encoders to extract features of focal tumor image, tumor geometrics and global brain networks.
arXiv Detail & Related papers (2022-03-21T10:20:04Z) - Collaborative learning of images and geometrics for predicting
isocitrate dehydrogenase status of glioma [8.262398325144774]
Gold standard of IDH mutation detection requires tumour tissue obtained via invasive approaches and is usually expensive.
Recent advancement in radiogenomics provides a non-invasive approach for predicting IDH mutation based on MRI.
Here we propose a collaborative learning framework that learns both tumor images and tumor geometrics using convolutional neural networks (CNN) and graph neural networks (GNN)
Our results show that the proposed model outperforms the baseline model of 3D-DenseNet121.
arXiv Detail & Related papers (2022-01-14T15:58:07Z) - DeepGene Transformer: Transformer for the gene expression-based classification of cancer subtypes [5.179504118679301]
Cancer and its subtypes constitute approximately 30% of all causes of death globally.
DeepGene Transformer is proposed which addresses the complexity of high-dimensional gene expression with a multi-head self-attention module.
arXiv Detail & Related papers (2021-08-26T15:02:55Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Unsupervised Feature Selection for Tumor Profiles using Autoencoders and
Kernel Methods [1.9078991171384014]
This work aims to learn meaningful and low dimensional representations of tumor samples and find tumor subtype clusters.
The proposed method named Latent Kernel Feature Selection (LKFS) is an unsupervised approach for gene selection in tumor gene expression profiles.
arXiv Detail & Related papers (2020-07-12T21:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.