Protein Inverse Folding From Structure Feedback
- URL: http://arxiv.org/abs/2506.03028v1
- Date: Tue, 03 Jun 2025 16:02:12 GMT
- Title: Protein Inverse Folding From Structure Feedback
- Authors: Junde Xu, Zijun Gao, Xinyi Zhou, Jie Hu, Xingyi Cheng, Le Song, Guangyong Chen, Pheng-Ann Heng, Jiezhong Qiu,
- Abstract summary: We introduce a novel approach to fine-tune an inverse folding model using feedback from a protein folding model.<n>Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning leads to a significant improvement in average TM-Score.
- Score: 78.27854221882572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling candidate sequences from the inverse-folding model, then predict the three-dimensional structure of each sequence with the folding model to generate pairwise structural-preference labels. These labels are used to fine-tune the inverse-folding model under the DPO objective. Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning not only improves sequence recovery of baseline models but also leads to a significant improvement in average TM-Score from 0.77 to 0.81, indicating enhanced structure similarity. Furthermore, iterative application of our DPO-based method on challenging protein structures yields substantial gains, with an average TM-Score increase of 79.5\% with regard to the baseline model. This work establishes a promising direction for enhancing protein sequence design ability from structure feedback by effectively utilizing preference optimization.
Related papers
- Improving Protein Sequence Design through Designability Preference Optimization [22.037870784317885]
We redefine the training objective by steering sequence generation toward high designability.<n>We introduce Residue-level Designability Preference Optimization (ResiDPO)<n>This enables direct improvement in designability while preserving regions that already perform well.
arXiv Detail & Related papers (2025-05-30T23:02:51Z) - Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization [33.131551374836775]
Inverse folding models predict amino acid sequences that fold into desired reference structures.
ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure.
But when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure.
arXiv Detail & Related papers (2024-10-25T11:04:02Z) - Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding [0.0]
inverse folding is a one-to-many problem where several sequences can fold to the same structure.
We present RL-DIF, a categorical diffusion model for inverse folding that is pre-trained on sequence recovery and tuned via reinforcement learning.
Experiments show RL-DIF can achieve an foldable diversity of 29% on CATH 4.2, compared to 23% from models trained on the same dataset.
arXiv Detail & Related papers (2024-10-22T16:50:34Z) - Inverse folding for antibody sequence design using deep learning [2.8998926117101367]
We propose a fine-tuned folding inverse model that is specifically optimised for antibody structures.
We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters.
arXiv Detail & Related papers (2023-10-30T13:12:41Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot
Antibody Designer [58.97153056120193]
The specificity of an antibody is determined by its complementarity-determining regions (CDRs)
Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling.
We propose a textitsimple yet effective model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner.
arXiv Detail & Related papers (2023-04-21T13:24:26Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.