UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data
- URL: http://arxiv.org/abs/2408.00200v1
- Date: Wed, 31 Jul 2024 23:50:27 GMT
- Title: UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data
- Authors: Michael Hartung, Andreas Maier, Fernando Delgado-Chaves, Yuliya Burankova, Olga I. Isaeva, Fábio Malta de Sá Patroni, Daniel He, Casey Shannon, Katharina Kaufmann, Jens Lohmann, Alexey Savchik, Anne Hartebrodt, Zoe Chervontseva, Farzaneh Firoozbakht, Niklas Probul, Evgenia Zotova, Olga Tsoy, David B. Blumenthal, Martin Ester, Tanja Laske, Jan Baumbach, Olga Zolotareva,
- Abstract summary: UnPaSt can detect many biologically insightful and reproducible patterns in omic datasets.
UnPaSt can detect major breast cancer subtypes, only few identified Th2-high asthma, and UnPaSt significantly outperformed its closest competitors in both test datasets.
- Score: 32.56524607427561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most complex diseases, including cancer and non-malignant diseases like asthma, have distinct molecular subtypes that require distinct clinical approaches. However, existing computational patient stratification methods have been benchmarked almost exclusively on cancer omics data and only perform well when mutually exclusive subtypes can be characterized by many biomarkers. Here, we contribute with a massive evaluation attempt, quantitatively exploring the power of 22 unsupervised patient stratification methods using both, simulated and real transcriptome data. From this experience, we developed UnPaSt (https://apps.cosy.bio/unpast/) optimizing unsupervised patient stratification, working even with only a limited number of subtype-predictive biomarkers. We evaluated all 23 methods on real-world breast cancer and asthma transcriptomics data. Although many methods reliably detected major breast cancer subtypes, only few identified Th2-high asthma, and UnPaSt significantly outperformed its closest competitors in both test datasets. Essentially, we showed that UnPaSt can detect many biologically insightful and reproducible patterns in omic datasets.
Related papers
- ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease [9.518786316441718]
We introduce ADPv2, a novel dataset focused on gastrointestinal histopathology.<n>Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels.<n>We show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery.
arXiv Detail & Related papers (2025-07-08T04:19:10Z) - Biomarker based Cancer Classification using an Ensemble with Pre-trained Models [2.2436844508175224]
We propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks.
We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929.
We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464)
arXiv Detail & Related papers (2024-06-14T14:43:59Z) - MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive
Learning with Omics-Inference Modeling [9.900594964709116]
We develop MoCLIM, a representation learning framework for cancer subtyping.
We show that our approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances.
Our framework incorporates various medical evaluations as the final component, providing high interpretability in medical analysis.
arXiv Detail & Related papers (2023-08-17T10:49:48Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics
Data [5.232428469965068]
We propose to investigate automatic subtyping from an unsupervised learning perspective.
Specifically, we bypass the strong Gaussianity assumption that typically exists but fails in the unsupervised learning subtyping literature.
Our proposed method better captures the latent space features and models the cancer subtype manifestation on a molecular basis.
arXiv Detail & Related papers (2022-04-02T11:44:58Z) - Metastatic Cancer Outcome Prediction with Injective Multiple Instance
Pooling [1.0965065178451103]
We process two public datasets to set up a benchmark cohort of 341 patient in total for studying outcome prediction of metastatic cancer.
We propose two injective multiple instance pooling functions that are better suited to outcome prediction.
Our results show that multiple instance learning with injective pooling functions can achieve state-of-the-art performance in the non-small-cell lung cancer CT and head and neck CT outcome prediction benchmarking tasks.
arXiv Detail & Related papers (2022-03-09T16:58:03Z) - Weakly-supervised learning for image-based classification of primary
melanomas into genomic immune subgroups [1.4585861543119112]
We develop deep learning models to classify gigapixel H&E stained pathology slides into immune subgroups.
We leverage a multiple-instance learning approach, which only requires slide-level labels and uses an attention mechanism to highlight regions of high importance to the classification.
arXiv Detail & Related papers (2022-02-23T13:57:35Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Personalized pathology test for Cardio-vascular disease: Approximate
Bayesian computation with discriminative summary statistics learning [48.7576911714538]
We propose a platelet deposition model and an inferential scheme to estimate the biologically meaningful parameters using approximate computation.
This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.
arXiv Detail & Related papers (2020-10-13T15:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.