Distilled Protein Backbone Generation
- URL: http://arxiv.org/abs/2510.03095v3
- Date: Mon, 27 Oct 2025 19:32:07 GMT
- Title: Distilled Protein Backbone Generation
- Authors: Liyang Xie, Haoran Zhang, Zhendong Wang, Wesley Tansey, Mingyuan Zhou,
- Abstract summary: Diffusion- and flow-based generative models offer unprecedented capabilities for de novo protein design.<n>These models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process.<n>We show how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators.
- Score: 59.63474232035653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for de novo protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the Proteina teacher model. This reduction in inference cost enables large-scale in silico protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications. The PyTorch implementation is available at https://github.com/LY-Xie/SiD_Protein
Related papers
- SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers [50.18388227899971]
We present SaDiT, a novel framework that accelerates protein backbone generation by integrating SaProt Tokenization with a Diffusion Transformer (DiT) architecture.<n>Experiments demonstrate that SaDiT outperforms state-of-the-art models, including RFDiffusion and Proteina, in both computational speed and structural viability.
arXiv Detail & Related papers (2026-02-06T13:50:13Z) - Directed Evolution of Proteins via Bayesian Optimization in Embedding Space [0.0]
We present a novel method for machine-learning-assisted directed evolution of proteins.<n>It combines Bayesian optimization with informative representation of protein variants extracted from a pre-trained protein language model.<n>Our method outperforms the state-of-the-art machine-learning-assisted directed evolution methods with regression objective.
arXiv Detail & Related papers (2025-09-05T10:47:49Z) - ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning [49.2607661375311]
We present ProteinZero, a novel framework that enables computationally scalable, automated, and continuous self-improvement of the inverse folding model.<n>ProteinZero substantially outperforms existing methods across every key metric in protein design.<n> Notably, the entire RL run on CATH-4.3 can be done with a single 8 X GPU node in under 3 days, including reward.
arXiv Detail & Related papers (2025-06-09T06:08:59Z) - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms.<n>Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Laughing Hyena Distillery: Extracting Compact Recurrences From
Convolutions [101.08706223326928]
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers.
In this paper, we seek to enable $mathcal O(1)$ compute and memory cost per token in any pre-trained long convolution architecture.
arXiv Detail & Related papers (2023-10-28T18:40:03Z) - Protein Discovery with Discrete Walk-Jump Sampling [41.01079393600248]
We learn a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo, and projecting back to the true data manifold with one-step denoising.
Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model.
We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models.
arXiv Detail & Related papers (2023-06-08T17:03:46Z) - Improving few-shot learning-based protein engineering with evolutionary
sampling [0.0]
We propose a few-shot learning approach to novel protein design that aims to accelerate the expensive wet lab testing cycle.
Our approach is composed of two parts: a semi-supervised transfer learning approach to generate a discrete fitness landscape for a desired protein function and a novel evolutionary Monte Carlo Chain sampling algorithm.
We demonstrate the performance of our approach by experimentally screening predicted high fitness gene activators, resulting in a dramatically improved hit rate compared to existing methods.
arXiv Detail & Related papers (2023-05-23T23:07:53Z) - Plug & Play Directed Evolution of Proteins with Gradient-based Discrete
MCMC [1.0499611180329804]
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations.
We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models.
By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins.
arXiv Detail & Related papers (2022-12-20T00:26:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.