Escaping Local Optima in the Waddington Landscape: A Multi-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis
- URL: http://arxiv.org/abs/2510.13018v1
- Date: Tue, 14 Oct 2025 22:20:56 GMT
- Title: Escaping Local Optima in the Waddington Landscape: A Multi-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis
- Authors: Francis Boabang, Samuel Asante Gyamerah,
- Abstract summary: We introduce a multi-stage learning reinforcement algorithm for single-cell perturbation policy modeling.<n>We first update an elusive natural perturbationvector and a conjugate KLPO trust solver to provide a safe, first step for policy modeling.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling cellular responses to genetic and chemical perturbations remains a central challenge in single-cell biology. Existing data-driven framework have advanced perturbation prediction through variational autoencoders, chemically conditioned autoencoders, and large-scale transformer pretraining. However, these models are prone to local optima in the nonconvex Waddington landscape of cell fate decisions, where poor initialization can trap trajectories in spurious lineages or implausible differentiation outcomes. While executable gene regulatory networks complement these approaches, automated design frameworks incorporate biological priors through multi-agent optimization. Yet, an approach that is completely data-driven with well-designed initialization to escape local optima and converge to a proper lineage remains elusive. In this work, we introduce a multistage reinforcement learning algorithm tailored for single-cell perturbation modeling. We first compute an explicit natural gradient update using Fisher-vector products and a conjugate gradient solver, scaled by a KL trust-region constraint to provide a safe, curvature-aware the first step for the policy. Starting with these preconditioned parameters, we then apply a second phase of proximal policy optimization (PPO) with clipped surrogates, exploiting minibatch efficiency to refine the policy. We demonstrate that this initialization substantially improves generalization on Single-cell RNA sequencing (scRNA-seq) and Single-cell ATAC sequencing (scATAC-seq) pertubation analysis.
Related papers
- Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning [17.65459083031186]
We train dynamics of gradient descent in a softmax self-attention layer trained to perform linear regression.<n>We show that a simple first-order gradient descent can converge to the globally optimal self-attention parameters.
arXiv Detail & Related papers (2026-03-02T06:44:54Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning [59.11663802446183]
Flow and diffusion generative models can be adapted to optimize task-specific objectives while preserving prior information.<n>We introduce Flow Density Control (FDC), a simple algorithm that reduces this complex problem to a specific sequence of simpler fine-tuning tasks.<n>We derive convergence guarantees for the proposed scheme under realistic assumptions by leveraging recent understanding of mirror flows.
arXiv Detail & Related papers (2025-11-27T17:19:01Z) - Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data [45.87606039212519]
Single-cell technologies generate high-dimensional point clouds of cells.<n>Each patient is represented by an irregular point cloud rather than a simple vector.<n>We adapt the Linear Optimal Transport framework to embed irregular point clouds into a fixed-dimensional Euclidean space.
arXiv Detail & Related papers (2025-10-24T21:33:12Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery [0.0]
We introduce InVirtuoGen, a discrete flow generative model for fragmented SMILES for de novo and fragment-constrained generation.<n>For property and lead optimization, we propose a hybrid scheme that combines a genetic algorithm with a Proximal Property Optimization fine-tuning strategy.<n>Our approach sets a new state-of-the-art on the Practical Molecular Optimization benchmark, measured by top-10 AUC across tasks.
arXiv Detail & Related papers (2025-09-30T15:34:53Z) - Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data [0.0]
Single-cell RNA-seq provides detailed molecular snapshots of individual cells.<n>Most studies still rely on principal component analysis (PCA) for dimensionality reduction.<n>We improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components.
arXiv Detail & Related papers (2025-09-18T21:08:38Z) - Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints [64.15709757611369]
We propose a new self-supervised pre-training approach to dealing with heterogeneous data.<n>The proposed approach can significantly improve the adaptivity of the self-supervised pre-trained model for the downstream supervised fine-tuning tasks.
arXiv Detail & Related papers (2025-08-27T15:48:50Z) - Unsupervised Parameter Efficient Source-free Post-pretraining [52.27955794126508]
We introduce UpStep, an Unsupervised.<n>Source-free post-pretraining approach to adapt a base model from a source domain to a target domain.<n>We use various general backbone architectures, both supervised and unsupervised, trained on Imagenet as our base model.
arXiv Detail & Related papers (2025-02-28T18:54:51Z) - Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data [39.146761527401424]
Single-cell RNA sequencing allows the quantification of gene expression at the individual cell level.<n> Dimensionality reduction is a common preprocessing step critical for the visualization, clustering, and phenotypic characterization of samples.<n>We present a generalized matrix factorization model assuming a general exponential dispersion family distribution.<n>We show that our method scales seamlessly to millions of cells, enabling dimensionality reduction in large single-cell datasets.
arXiv Detail & Related papers (2024-12-29T16:02:15Z) - Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning [75.9729413703531]
DIPPER is a novel HRL framework that formulates hierarchical policy learning as a bi-level optimization problem.<n>We show that DIPPER achieves up to 40% improvement over state-of-the-art baselines in sparse reward scenarios.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Implicit Bias and Fast Convergence Rates for Self-attention [26.766649949420746]
We study the fundamental optimization principles of self-attention, the defining mechanism of transformers.<n>We analyze the implicit bias of gradient-baseds in a self-attention layer with a decoder in a linear classification.
arXiv Detail & Related papers (2024-02-08T15:15:09Z) - K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data
Analysis [0.3683202928838613]
We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_2,1$ norm regularization.
We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method.
We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse scRNA-seq datasets.
arXiv Detail & Related papers (2023-10-23T03:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.