Repurposing Protein Language Models for Latent Flow-Based Fitness Optimization
- URL: http://arxiv.org/abs/2602.02425v1
- Date: Mon, 02 Feb 2026 18:25:33 GMT
- Title: Repurposing Protein Language Models for Latent Flow-Based Fitness Optimization
- Authors: Amaru Caceres Arroyo, Lea Bogensperger, Ahmed Allam, Michael Krauthammer, Konrad Schindler, Dominik Narnhofer,
- Abstract summary: CHASE is a framework that repurposes the evolutionary knowledge of pretrained protein language models.<n>It achieves state-of-the-art performance on AAV and GFP protein design benchmarks.
- Score: 24.267946140577806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protein fitness optimization is challenged by a vast combinatorial landscape where high-fitness variants are extremely sparse. Many current methods either underperform or require computationally expensive gradient-based sampling. We present CHASE, a framework that repurposes the evolutionary knowledge of pretrained protein language models by compressing their embeddings into a compact latent space. By training a conditional flow-matching model with classifier-free guidance, we enable the direct generation of high-fitness variants without predictor-based guidance during the ODE sampling steps. CHASE achieves state-of-the-art performance on AAV and GFP protein design benchmarks. Finally, we show that bootstrapping with synthetic data can further enhance performance in data-constrained settings.
Related papers
- Visual Autoregressive Modelling for Monocular Depth Estimation [69.01449528371916]
We propose a monocular depth estimation method based on visual autoregressive ( VAR) priors.<n>Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism.<n>We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets.
arXiv Detail & Related papers (2025-12-27T17:08:03Z) - Elastic ViTs from Pretrained Models without Retraining [74.5386166956142]
Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes.<n>We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers.<n>Our approach efficiently combines gradient information with cross-network structure correlations, approximated via an evolutionary algorithm.
arXiv Detail & Related papers (2025-10-20T16:15:03Z) - Steering Generative Models with Experimental Data for Protein Fitness Optimization [25.404040461393876]
Protein fitness optimization involves finding a sequence that maximizes desired quantitative properties in a large design space of possible sequences.<n>Recent advances in steering protein generative models with labeled data offer a promising approach.<n>In this study, we explore fitness optimization using small amounts (hundreds) of labeled sequence-fitness pairs.
arXiv Detail & Related papers (2025-05-21T04:30:48Z) - Sample as You Infer: Predictive Coding With Langevin Dynamics [11.515490109360012]
We present a novel algorithm for parameter learning in generic deep generative models.
Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder training.
arXiv Detail & Related papers (2023-11-22T19:36:47Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Robust Model-Based Optimization for Challenging Fitness Landscapes [96.63655543085258]
Protein design involves optimization on a fitness landscape.
Leading methods are challenged by sparsity of high-fitness samples in the training set.
We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools.
We propose a new approach that uses a novel VAE as its search model to overcome the problem.
arXiv Detail & Related papers (2023-05-23T03:47:32Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Guided Generative Protein Design using Regularized Transformers [5.425399390255931]
We introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder which is trained to jointly generate sequences and predict fitness.
We explicitly model the underlying sequence-function landscape of large labeled datasets and optimize within latent space using gradient-based methods.
arXiv Detail & Related papers (2022-01-24T20:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.