Disentangling Shared and Target-Enriched Topics via Background-Contrastive Non-negative Matrix Factorization
- URL: http://arxiv.org/abs/2602.22387v1
- Date: Wed, 25 Feb 2026 20:34:07 GMT
- Title: Disentangling Shared and Target-Enriched Topics via Background-Contrastive Non-negative Matrix Factorization
- Authors: Yixuan Li, Archer Y. Yang, Yue Li,
- Abstract summary: We introduce background contrastive Non-negative Matrix Factorization (model)<n>model extracts target-enriched latent topics by jointly factorizing a target dataset and a matched background.<n>It reveals signals obscured by conventional methods, including disease-associated programs in postmortem depressive brain single-cell RNA-seq.
- Score: 19.996419408387478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biological signals of interest in high-dimensional data are often masked by dominant variation shared across conditions. This variation, arising from baseline biological structure or technical effects, can prevent standard dimensionality reduction methods from resolving condition-specific structure. The challenge is that these confounding topics are often unknown and mixed with biological signals. Existing background correction methods are either unscalable to high dimensions or not interpretable. We introduce background contrastive Non-negative Matrix Factorization (\model), which extracts target-enriched latent topics by jointly factorizing a target dataset and a matched background using shared non-negative bases under a contrastive objective that suppresses background-expressed structure. This approach yields non-negative components that are directly interpretable at the feature level, and explicitly isolates target-specific variation. \model is learned by an efficient multiplicative update algorithm via matrix multiplication such that it is highly efficient on GPU hardware and scalable to big data via minibatch training akin to deep learning approach. Across simulations and diverse biological datasets, \model reveals signals obscured by conventional methods, including disease-associated programs in postmortem depressive brain single-cell RNA-seq, genotype-linked protein expression patterns in mice, treatment-specific transcriptional changes in leukemia, and TP53-dependent drug responses in cancer cell lines.
Related papers
- scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction [12.48933770510505]
We present scDFM, a generative framework based on conditional flow matching.<n> scDFM aligns perturbed and control populations beyond cell-level correspondences.
arXiv Detail & Related papers (2026-02-06T17:00:21Z) - An AI-enabled tool for quantifying overlapping red blood cell sickling dynamics in microfluidic assays [5.577003343220155]
This framework integrates AI-assisted annotation, segmentation, classification, and instance counting to quantify red blood cell populations.<n>It can more than double the experimental throughput via densely packed cell suspensions, capture drug-dependent sickling behavior, and reveal mechanobiological signatures of cellular morphological evolution.
arXiv Detail & Related papers (2026-01-25T05:32:53Z) - scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration [53.683726781791385]
We introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration.<n>Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation.
arXiv Detail & Related papers (2025-10-28T21:28:39Z) - CellPainTR: Generalizable Representation Learning for Cross-Dataset Cell Painting Analysis [51.56484100374058]
We introduce CellPainTR, a Transformer-based architecture designed to learn foundational representations of cellular morphology.<n>Our work represents a significant step towards creating truly foundational models for image-based profiling, enabling more reliable and scalable cross-study biological analysis.
arXiv Detail & Related papers (2025-09-02T03:30:07Z) - Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z) - scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder [3.194381706244149]
We propose scMEDAL, a single-cell Mixed Effects Deep Autoencoder Learning framework.<n> scMEDAL models batch-invariant and batch-specific effects using two complementaryworks.<n> scMEDAL produces interpretable, batch-specific embeddings that complement both scMEDAL-FE and established correction methods.
arXiv Detail & Related papers (2024-11-11T00:10:48Z) - Empirical Bayes Linked Matrix Decomposition [0.0]
We propose an empirical variational Bayesian approach to this problem.
We describe an associated iterative imputation approach that is novel for the single-matrix context.
We show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal.
arXiv Detail & Related papers (2024-08-01T02:13:11Z) - Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View [49.03501451546763]
We identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition.
We propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents.
arXiv Detail & Related papers (2024-07-14T04:41:16Z) - Weakly supervised covariance matrices alignment through Stiefel matrices
estimation for MEG applications [64.20396555814513]
This paper introduces a novel domain adaptation technique for time series data, called Mixing model Stiefel Adaptation (MSA)
We exploit abundant unlabeled data in the target domain to ensure effective prediction by establishing pairwise correspondence with equivalent signal variances between domains.
MSA outperforms recent methods in brain-age regression with task variations using magnetoencephalography (MEG) signals from the Cam-CAN dataset.
arXiv Detail & Related papers (2024-01-24T19:04:49Z) - Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics [44.97217246897902]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Modelling Technical and Biological Effects in scRNA-seq data with
Scalable GPLVMs [6.708052194104378]
We extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets.
The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast variational inference.
arXiv Detail & Related papers (2022-09-14T15:25:15Z) - A deep learning driven pseudospectral PCE based FFT homogenization
algorithm for complex microstructures [68.8204255655161]
It is shown that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
It is shown, that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
arXiv Detail & Related papers (2021-10-26T07:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.