CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation
- URL: http://arxiv.org/abs/2506.22963v1
- Date: Sat, 28 Jun 2025 17:45:45 GMT
- Title: CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation
- Authors: Kevin Lam, William Daniels, J Maxwell Douglas, Daniel Lai, Samuel Aparicio, Benjamin Bloem-Reddy, Yongjin Park,
- Abstract summary: We introduce the Copy Number Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states.<n> CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure.<n>We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data.
- Score: 1.7590081165362783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.
Related papers
- Denoising diffusion networks for normative modeling in neuroimaging [1.0195618602298684]
Most neuroimaging pipelines fit one model per imaging-derived phenotype (IDP)<n>We propose denoising diffusion probabilistic models (DDPMs) as a unified conditional density estimator for IDPs.<n>We evaluate on a synthetic benchmark with heteroscedastic and multimodal age effects and on UK Biobank FreeSurfer phenotypes, scaling from dimension of 2 to 200.
arXiv Detail & Related papers (2026-01-24T06:19:10Z) - Survival Modeling from Whole Slide Images via Patch-Level Graph Clustering and Mixture Density Experts [7.0624785659308165]
We propose a modular framework for predicting cancer specific survival from whole slide pathology images.<n>The framework consists of four key stages designed to capture prognostic morphological and heterogeneity.
arXiv Detail & Related papers (2025-07-22T11:32:36Z) - ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions [64.17845687013434]
We propose ReDiSC, a structured diffusion model for structured node classification.<n>We show that ReDiSC achieves superior or highly competitive performance compared to state-of-the-art GNN, label propagation, and diffusion-based baselines.<n> Notably, ReDiSC scales effectively to large-scale datasets on which previous structured diffusion methods fail due to computational constraints.
arXiv Detail & Related papers (2025-07-19T04:46:53Z) - STG: Spatiotemporal Graph Neural Network with Fusion and Spatiotemporal Decoupling Learning for Prognostic Prediction of Colorectal Cancer Liver Metastasis [9.511932098831322]
We propose a multimodaltemporal graph neural network (STG) framework to predict colorectal cancer liver metastasis (KCCM)<n>Our STG framework combines CT imaging and clinical data into a heterogeneous graph structure, enabling joint modeling of tumor distribution edges and temporal evolution.<n>A lightweight version of the model reduces parameter count by 78.55%, maintaining near-state-of-the-art performance.
arXiv Detail & Related papers (2025-05-06T02:41:34Z) - Identifiable Multi-View Causal Discovery Without Non-Gaussianity [63.217175519436125]
We propose a novel approach to linear causal discovery in the framework of multi-view Structural Equation Models (SEM)<n>We prove the identifiability of all the parameters of the model without any further assumptions on the structure of the SEM other than it being acyclic.<n>The proposed methodology is validated through simulations and application on real data, where it enables the estimation of causal graphs between brain regions.
arXiv Detail & Related papers (2025-02-27T14:06:14Z) - Predictive Analytics of Varieties of Potatoes [2.336821989135698]
We explore the application of machine learning algorithms specifically to enhance the selection process of Russet potato clones in breeding trials.
This study addresses the challenge of efficiently identifying high-yield, disease-resistant, and climate-resilient potato varieties.
arXiv Detail & Related papers (2024-04-04T00:49:05Z) - S3M: Scalable Statistical Shape Modeling through Unsupervised
Correspondences [91.48841778012782]
We propose an unsupervised method to simultaneously learn local and global shape structures across population anatomies.
Our pipeline significantly improves unsupervised correspondence estimation for SSMs compared to baseline methods.
Our method is robust enough to learn from noisy neural network predictions, potentially enabling scaling SSMs to larger patient populations.
arXiv Detail & Related papers (2023-04-15T09:39:52Z) - A Novel Unified Conditional Score-based Generative Framework for
Multi-modal Medical Image Completion [54.512440195060584]
We propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM)
UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions.
Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions.
arXiv Detail & Related papers (2022-07-07T16:57:21Z) - Optimize Deep Learning Models for Prediction of Gene Mutations Using
Unsupervised Clustering [6.494144125433731]
Deep learning has become the mainstream methodological choice for analyzing and interpreting whole-slide digital pathology images.
In this paper, we proposed an unsupervised clustering-based multiple-instance learning, and apply our method to develop deep-learning models for prediction of gene mutations using WSIs from three cancer types.
We showed that unsupervised clustering of image patches could help identify predictive patches, exclude patches lack of predictive information, and therefore improve prediction on gene mutations in all three different cancer types.
arXiv Detail & Related papers (2022-03-31T11:48:21Z) - A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference.
DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs.
We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Community Detection and Stochastic Block Models [20.058330327502503]
The geometric block model (SBM) is widely employed as a canonical model to study clustering and community detection.
It provides a fertile ground to study the information-theoretic and computational tradeoffs that arise in statistics and data science.
This book surveys the recent developments that establish the fundamental limits for community detection in the SBM.
arXiv Detail & Related papers (2017-03-29T17:21:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.