MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification
- URL: http://arxiv.org/abs/2508.19424v1
- Date: Tue, 26 Aug 2025 20:42:20 GMT
- Title: MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification
- Authors: Yifan Dou, Adam Khadre, Ruben C Petreaca, Golrokh Mirzaei,
- Abstract summary: We introduce a novel unsupervised contrastive learning framework to cluster 43 cancer types.<n>For each cancer type, we construct two complementary mutation signatures.<n>We demonstrate that the resulting latent representations yield biologically meaningful clusters of cancer types.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivation. Understanding the pan-cancer mutational landscape offers critical insights into the molecular mechanisms underlying tumorigenesis. While patient-level machine learning techniques have been widely employed to identify tumor subtypes, cohort-level clustering, where entire cancer types are grouped based on shared molecular features, has largely relied on classical statistical methods. Results. In this study, we introduce a novel unsupervised contrastive learning framework to cluster 43 cancer types based on coding mutation data derived from the COSMIC database. For each cancer type, we construct two complementary mutation signatures: a gene-level profile capturing nucleotide substitution patterns across the most frequently mutated genes, and a chromosome-level profile representing normalized substitution frequencies across chromosomes. These dual views are encoded using TabNet encoders and optimized via a multi-scale contrastive learning objective (NT-Xent loss) to learn unified cancer-type embeddings. We demonstrate that the resulting latent representations yield biologically meaningful clusters of cancer types, aligning with known mutational processes and tissue origins. Our work represents the first application of contrastive learning to cohort-level cancer clustering, offering a scalable and interpretable framework for mutation-driven cancer subtyping.
Related papers
- DLSOM: A Deep learning-based strategy for liver cancer subtyping [0.0]
Liver cancer is a leading cause of cancer-related mortality worldwide.<n>This study introduces DLSOM, a deep learning framework utilizing stacked autoencoders to analyze the complete somatic mutation landscape of 1,139 liver cancer samples.
arXiv Detail & Related papers (2024-12-15T23:13:29Z) - Multimodal Prototyping for cancer survival prediction [45.61869793509184]
Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and transcriptomic profiles are particularly promising for patient prognostication and stratification.
Current approaches involve tokenizing the WSIs into smaller patches (>10,000 patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes.
This process generates many tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses.
Our framework outperforms state-of-the-art methods with much less computation while unlocking new interpretability analyses.
arXiv Detail & Related papers (2024-06-28T20:37:01Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive
Learning with Omics-Inference Modeling [9.900594964709116]
We develop MoCLIM, a representation learning framework for cancer subtyping.
We show that our approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances.
Our framework incorporates various medical evaluations as the final component, providing high interpretability in medical analysis.
arXiv Detail & Related papers (2023-08-17T10:49:48Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Biomarker Gene Identification for Breast Cancer Classification [2.403531305046943]
The present work uses interpretable predictions made by the deep neural network employed for subtype classification to identify biomarkers.
The proposed algorithm led to the discovery of a set of 43 differentially expressed gene signatures.
arXiv Detail & Related papers (2021-11-10T06:38:50Z) - DeepGene Transformer: Transformer for the gene expression-based classification of cancer subtypes [5.179504118679301]
Cancer and its subtypes constitute approximately 30% of all causes of death globally.
DeepGene Transformer is proposed which addresses the complexity of high-dimensional gene expression with a multi-head self-attention module.
arXiv Detail & Related papers (2021-08-26T15:02:55Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.