Multi-state Protein Design with DynamicMPNN
- URL: http://arxiv.org/abs/2507.21938v2
- Date: Wed, 15 Oct 2025 12:29:20 GMT
- Title: Multi-state Protein Design with DynamicMPNN
- Authors: Alex Abrudan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi, Matthew Greenig, Felipe Engelberger, Alena Khmelinskaia, Jens Meiler, Michele Vendruscolo, Tuomas P. J. Knowles,
- Abstract summary: Existing multi-state design approaches rely on post-hoc aggregation of singlestate predictions.<n>We introduce DynamicMPNN, an inverse model explicitly trained to generate sequences compatible with multiple conformations.
- Score: 2.8456027933151993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using Alphafold 3, DynamicMPNN outperforms ProteinMPNN by up to 25% on decoy-normalized RMSD and by 12% on sequence recovery across our challenging multi-state protein benchmark.
Related papers
- Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics [16.336540408998598]
HADES is a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior.<n>A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system.<n>Experiments demonstrate that our method outperforms state-of-the-art baselines in in-silico evaluations.
arXiv Detail & Related papers (2026-01-16T05:53:53Z) - PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations [42.870409939729974]
We present PRISM, a multimodal retrieval-augmented generation framework for inverse folding.<n>It retrieves fine-grained representations of potential motifs from known proteins and integrates them with a hybrid self-cross attention decoder.<n> PRISM establishes new state of the art in both perplexity and amino acid recovery, while also improving foldability metrics.
arXiv Detail & Related papers (2025-10-12T00:45:22Z) - Bidirectional Hierarchical Protein Multi-Modal Representation Learning [4.682021474006426]
Protein language models (pLMs) pretrained on large scale protein sequences have demonstrated significant success in sequence-based tasks.<n> graph neural networks (GNNs) designed to leverage 3D structural information have shown promising generalization in protein-related prediction tasks.<n>Our framework employs attention and gating mechanisms to enable effective interaction between pLMs-generated sequential representations and GNN-extracted structural features.
arXiv Detail & Related papers (2025-04-07T06:47:49Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.<n>Deep generative models have shown promise in generating protein conformations as a more efficient alternative.<n>We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Diffusion on language model encodings for protein sequence generation [0.5182791771937247]
We present DiMA, a latent diffusion framework that operates on protein language model representations.<n>Our framework consistently produces novel, high-quality and diverse protein sequences.<n>It supports conditional generation tasks including protein family-generation, motif scaffolding and infilling, and fold-specific sequence design.
arXiv Detail & Related papers (2024-03-06T14:15:20Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Protein Sequence and Structure Co-Design with Equivariant Translation [19.816174223173494]
Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models.
We propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state.
Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features.
All protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process.
arXiv Detail & Related papers (2022-10-17T06:00:12Z) - AlphaFold Distillation for Protein Design [25.190210443632825]
Inverse protein folding is crucial in bio-engineering and drug discovery.
Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences.
We propose using knowledge distillation on folding model confidence metrics to create a faster and end-to-end differentiable distilled model.
arXiv Detail & Related papers (2022-10-05T19:43:06Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.