A Deep Embedded Refined Clustering Approach for Breast Cancer
Distinction based on DNA Methylation
- URL: http://arxiv.org/abs/2102.09563v1
- Date: Thu, 18 Feb 2021 16:46:25 GMT
- Title: A Deep Embedded Refined Clustering Approach for Breast Cancer
Distinction based on DNA Methylation
- Authors: del Amor Roc\'io, Colomer Adri\'an, Monteagudo Carlos, Naranjo Valery
- Abstract summary: We propose a deep embedded refined clustering method for breast cancer differentiation based on DNA methylation.
The proposed approach is composed of two main stages. The first stage consists in the dimensionality reduction of the methylation data based on an autoencoder.
The second stage is a clustering algorithm based on the soft-assignment of the latent space provided by the autoencoder.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Epigenetic alterations have an important role in the development of several
types of cancer. Epigenetic studies generate a large amount of data, which
makes it essential to develop novel models capable of dealing with large-scale
data. In this work, we propose a deep embedded refined clustering method for
breast cancer differentiation based on DNA methylation. In concrete, the deep
learning system presented here uses the levels of CpG island methylation
between 0 and 1. The proposed approach is composed of two main stages. The
first stage consists in the dimensionality reduction of the methylation data
based on an autoencoder. The second stage is a clustering algorithm based on
the soft-assignment of the latent space provided by the autoencoder. The whole
method is optimized through a weighted loss function composed of two terms:
reconstruction and classification terms. To the best of the authors' knowledge,
no previous studies have focused on the dimensionality reduction algorithms
linked to classification trained end-to-end for DNA methylation analysis. The
proposed method achieves an unsupervised clustering accuracy of 0.9927 and an
error rate (%) of 0.73 on 137 breast tissue samples. After a second test of the
deep-learning-based method using a different methylation database, an accuracy
of 0.9343 and an error rate (%) of 6.57 on 45 breast tissue samples is
obtained. Based on these results, the proposed algorithm outperforms other
state-of-the-art methods evaluated under the same conditions for breast cancer
classification based on DNA methylation data.
Related papers
- Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii [7.412214379486083]
We will employ machine learning techniques to analyze the gene sequences of Perccottus glenii.
We constructed four classification models: Random Forest, LightGBM, XGBoost, and Decision Tree.
The dataset used by these classification models was extracted from the National Center for Biotechnology Information database.
arXiv Detail & Related papers (2024-10-11T14:45:47Z) - Breast Cancer Image Classification Method Based on Deep Transfer Learning [40.392772795903795]
A breast cancer image classification model algorithm combining deep learning and transfer learning is proposed.
Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0% in the test set, with a significantly improved classification accuracy compared to previous models.
arXiv Detail & Related papers (2024-04-14T12:09:47Z) - Fuzzy Gene Selection and Cancer Classification Based on Deep Learning
Model [1.3072222152900117]
We developed a new fuzzy gene selection technique (FGS) to identify informative genes to facilitate cancer classification.
With our FGS-enhanced method, the cancer classification model achieved 96.5%,96.2%,96%, and 95.9% for accuracy, precision, recall, and f1-score respectively.
In examining the six datasets that were used, the proposed model demonstrates it's capacity to classify cancer effectively.
arXiv Detail & Related papers (2023-05-04T21:52:57Z) - ReCasNet: Improving consistency within the two-stage mitosis detection
framework [5.263015177621435]
Existing approaches utilize a two-stage pipeline: the detection stage for identifying the locations of potential mitotic cells and the classification stage for refining prediction confidences.
This pipeline formulation can lead to inconsistencies in the classification stage due to the poor prediction quality of the detection stage and the mismatches in training data distributions.
We propose a Refine Cascade Network (ReCasNet), an enhanced deep learning pipeline that mitigates the aforementioned problems with three improvements.
arXiv Detail & Related papers (2022-02-28T16:03:14Z) - EMT-NET: Efficient multitask network for computer-aided diagnosis of
breast cancer [58.720142291102135]
We propose an efficient and light-weighted learning architecture to classify and segment breast tumors simultaneously.
We incorporate a segmentation task into a tumor classification network, which makes the backbone network learn representations focused on tumor regions.
The accuracy, sensitivity, and specificity of tumor classification is 88.6%, 94.1%, and 85.3%, respectively.
arXiv Detail & Related papers (2022-01-13T05:24:40Z) - Deep Learning Based Model for Breast Cancer Subtype Classification [3.419451872918847]
This paper focuses on the use of gene expression data for the classification of breast cancer into four subtypes, Basal, Her2, LumA, and LumB.
The size of the feature set is reduced from 20,530 gene expression values to 500 by using an autoencoder.
By deploying the combined network of stages 1 and 2, we have been able to attain a mean 10-fold test accuracy of 0.907 on the TCGA breast cancer dataset.
arXiv Detail & Related papers (2021-11-06T17:15:35Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Deep Semi-supervised Metric Learning with Dual Alignment for Cervical
Cancer Cell Detection [49.78612417406883]
We propose a novel semi-supervised deep metric learning method for cervical cancer cell detection.
Our model learns an embedding metric space and conducts dual alignment of semantic features on both the proposal and prototype levels.
We construct a large-scale dataset for semi-supervised cervical cancer cell detection for the first time, consisting of 240,860 cervical cell images.
arXiv Detail & Related papers (2021-04-07T17:11:27Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies
on Medical Image Classification [63.44396343014749]
We propose a new margin-based surrogate loss function for the AUC score.
It is more robust than the commonly used.
square loss while enjoying the same advantage in terms of large-scale optimization.
To the best of our knowledge, this is the first work that makes DAM succeed on large-scale medical image datasets.
arXiv Detail & Related papers (2020-12-06T03:41:51Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.