Related papers: Protein Inverse Folding From Structure Feedback

Protein Inverse Folding From Structure Feedback

URL: http://arxiv.org/abs/2506.03028v1
Date: Tue, 03 Jun 2025 16:02:12 GMT
Title: Protein Inverse Folding From Structure Feedback
Authors: Junde Xu, Zijun Gao, Xinyi Zhou, Jie Hu, Xingyi Cheng, Le Song, Guangyong Chen, Pheng-Ann Heng, Jiezhong Qiu,
Abstract summary: We introduce a novel approach to fine-tune an inverse folding model using feedback from a protein folding model.<n>Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning leads to a significant improvement in average TM-Score.
Score: 78.27854221882572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling candidate sequences from the inverse-folding model, then predict the three-dimensional structure of each sequence with the folding model to generate pairwise structural-preference labels. These labels are used to fine-tune the inverse-folding model under the DPO objective. Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning not only improves sequence recovery of baseline models but also leads to a significant improvement in average TM-Score from 0.77 to 0.81, indicating enhanced structure similarity. Furthermore, iterative application of our DPO-based method on challenging protein structures yields substantial gains, with an average TM-Score increase of 79.5\% with regard to the baseline model. This work establishes a promising direction for enhancing protein sequence design ability from structure feedback by effectively utilizing preference optimization.

Related papers

Improving Protein Sequence Design through Designability Preference Optimization [22.037870784317885]
We redefine the training objective by steering sequence generation toward high designability.<n>We introduce Residue-level Designability Preference Optimization (ResiDPO)<n>This enables direct improvement in designability while preserving regions that already perform well.
arXiv Detail & Related papers (2025-05-30T23:02:51Z)
Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization [33.131551374836775]
Inverse folding models predict amino acid sequences that fold into desired reference structures. ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure. But when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure.
arXiv Detail & Related papers (2024-10-25T11:04:02Z)
Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding [0.0]
inverse folding is a one-to-many problem where several sequences can fold to the same structure. We present RL-DIF, a categorical diffusion model for inverse folding that is pre-trained on sequence recovery and tuned via reinforcement learning. Experiments show RL-DIF can achieve an foldable diversity of 29% on CATH 4.2, compared to 23% from models trained on the same dataset.
arXiv Detail & Related papers (2024-10-22T16:50:34Z)
Inverse folding for antibody sequence design using deep learning [2.8998926117101367]
We propose a fine-tuned folding inverse model that is specifically optimised for antibody structures. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters.
arXiv Detail & Related papers (2023-10-30T13:12:41Z)
Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer [58.97153056120193]
The specificity of an antibody is determined by its complementarity-determining regions (CDRs) Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling. We propose a textitsimple yet effective model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner.
arXiv Detail & Related papers (2023-04-21T13:24:26Z)
Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs) We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z)
AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery. Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences. We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network. Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.