Improving Protein Sequence Design through Designability Preference Optimization
- URL: http://arxiv.org/abs/2506.00297v1
- Date: Fri, 30 May 2025 23:02:51 GMT
- Title: Improving Protein Sequence Design through Designability Preference Optimization
- Authors: Fanglei Xue, Andrew Kubaney, Zhichun Guo, Joseph K. Min, Ge Liu, Yi Yang, David Baker,
- Abstract summary: We redefine the training objective by steering sequence generation toward high designability.<n>We introduce Residue-level Designability Preference Optimization (ResiDPO)<n>This enables direct improvement in designability while preserving regions that already perform well.
- Score: 22.037870784317885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designability. To do this, we integrate Direct Preference Optimization (DPO), using AlphaFold pLDDT scores as the preference signal, which significantly improves the in silico design success rate. To further refine sequence generation at a finer, residue-level granularity, we introduce Residue-level Designability Preference Optimization (ResiDPO), which applies residue-level structural rewards and decouples optimization across residues. This enables direct improvement in designability while preserving regions that already perform well. Using a curated dataset with residue-level annotations, we fine-tune LigandMPNN with ResiDPO to obtain EnhancedMPNN, which achieves a nearly 3-fold increase in in silico design success rate (from 6.56% to 17.57%) on a challenging enzyme design benchmark.
Related papers
- Discrete Diffusion Trajectory Alignment via Stepwise Decomposition [70.9024656666945]
We propose a novel preference optimization method for masked discrete diffusion models.<n>Instead of applying the reward on the final output and backpropagating the gradient to the entire discrete denoising process, we decompose the problem into a set of stepwise alignment objectives.<n> Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach.
arXiv Detail & Related papers (2025-07-07T09:52:56Z) - Protein Inverse Folding From Structure Feedback [78.27854221882572]
We introduce a novel approach to fine-tune an inverse folding model using feedback from a protein folding model.<n>Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning leads to a significant improvement in average TM-Score.
arXiv Detail & Related papers (2025-06-03T16:02:12Z) - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms.<n>Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z) - Progressive Fine-to-Coarse Reconstruction for Accurate Low-Bit Post-Training Quantization in Vision Transformers [13.316135182889296]
Post-Training Quantization (PTQ) has been widely adopted for compressing Vision Transformers (ViTs)<n>When quantized into low-bit representations, there is often a significant performance drop compared to their full-precision counterparts.<n>We propose a Progressive Fine-to-Coarse Reconstruction (PFCR) method for accurate PTQ, which significantly improves the performance of low-bit quantized vision transformers.
arXiv Detail & Related papers (2024-12-19T08:38:59Z) - Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization [33.131551374836775]
Inverse folding models predict amino acid sequences that fold into desired reference structures.
ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure.
But when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure.
arXiv Detail & Related papers (2024-10-25T11:04:02Z) - A novel design update framework for topology optimization with quantum annealing: Application to truss and continuum structures [0.0]
This paper presents a novel design update strategy for topology optimization, as an iterative optimization.<n>The key contribution lies in incorporating a design updater concept with quantum annealing, applicable to both truss and continuum structures.<n>Results indicate that the proposed framework successfully finds optimal topologies similar to benchmark results.
arXiv Detail & Related papers (2024-06-27T02:07:38Z) - Decoupled Sequence and Structure Generation for Realistic Antibody Design [45.72237864940556]
A dominant paradigm is to train a model to jointly generate the antibody sequence and the structure as a candidate.<n>We propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction.<n>ASSD shows improved performance in various antibody design experiments, while the composition-based objective successfully mitigates token repetition of non-autoregressive models.
arXiv Detail & Related papers (2024-02-08T13:02:05Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization [68.28697120944116]
We train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection.
We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data.
Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results.
arXiv Detail & Related papers (2022-09-13T18:37:27Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.