Graph Denoising Diffusion for Inverse Protein Folding
- URL: http://arxiv.org/abs/2306.16819v2
- Date: Tue, 7 Nov 2023 08:28:11 GMT
- Title: Graph Denoising Diffusion for Inverse Protein Folding
- Authors: Kai Yi, Bingxin Zhou, Yiqing Shen, Pietro Li\`o, Yu Guang Wang
- Abstract summary: Inverse protein folding is challenging due to its inherent one-to-many mapping characteristic.
We propose a novel graph denoising diffusion model for inverse protein folding.
Our model achieves state-of-the-art performance over a set of popular baseline methods in sequence recovery.
- Score: 15.06549999760776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse protein folding is challenging due to its inherent one-to-many
mapping characteristic, where numerous possible amino acid sequences can fold
into a single, identical protein backbone. This task involves not only
identifying viable sequences but also representing the sheer diversity of
potential solutions. However, existing discriminative models, such as
transformer-based auto-regressive models, struggle to encapsulate the diverse
range of plausible solutions. In contrast, diffusion probabilistic models, as
an emerging genre of generative approaches, offer the potential to generate a
diverse set of sequence candidates for determined protein backbones. We propose
a novel graph denoising diffusion model for inverse protein folding, where a
given protein backbone guides the diffusion process on the corresponding amino
acid residue types. The model infers the joint distribution of amino acids
conditioned on the nodes' physiochemical properties and local environment.
Moreover, we utilize amino acid replacement matrices for the diffusion forward
process, encoding the biologically-meaningful prior knowledge of amino acids
from their spatial and sequential neighbors as well as themselves, which
reduces the sampling space of the generative process. Our model achieves
state-of-the-art performance over a set of popular baseline methods in sequence
recovery and exhibits great potential in generating diverse protein sequences
for a determined protein backbone structure.
Related papers
- From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering [8.173909751137888]
We first give the definition and characteristics of diffusion models and then focus on two strategies: Denoising Diffusion Probabilistic Models and Score-based Generative Models.
We discuss their applications in protein design, peptide generation, drug discovery, and protein-ligand interaction.
arXiv Detail & Related papers (2025-01-05T22:36:43Z) - Diffusion Model with Representation Alignment for Protein Inverse Folding [53.139837825588614]
Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure.
We propose a novel method that leverages diffusion models with representation alignment (DMRA)
In experiments, we conduct extensive evaluations on the CATH4.2 dataset to demonstrate that DMRA outperforms leading methods.
arXiv Detail & Related papers (2024-12-12T15:47:59Z) - Mask prior-guided denoising diffusion improves inverse protein folding [3.1373465343833704]
Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure.
We propose a framework that captures both structural and residue interactions for inverse protein folding.
MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise.
arXiv Detail & Related papers (2024-12-10T09:10:28Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design [56.957070405026194]
We propose an algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models.
DRAKES can generate sequences that are both natural-like and yield high rewards.
arXiv Detail & Related papers (2024-10-17T15:10:13Z) - Peptide Sequencing Via Protein Language Models [0.0]
We introduce a protein language model for determining the complete sequence of a peptide based on measurement of a limited set of amino acids.
Our method simulates partial sequencing data by selectively masking amino acids that are experimentally difficult to identify.
We achieve per-amino-acid accuracy up to 90.5% when only four amino acids are known.
arXiv Detail & Related papers (2024-08-01T20:12:49Z) - Protein Conformation Generation via Force-Guided SE(3) Diffusion Models [48.48934625235448]
Deep generative modeling techniques have been employed to generate novel protein conformations.
We propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation.
arXiv Detail & Related papers (2024-03-21T02:44:08Z) - Diffusion Language Models Are Versatile Protein Learners [75.98083311705182]
This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences.
We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework.
After pre-training, DPLM exhibits the ability to generate structurally plausible, novel, and diverse protein sequences for unconditional generation.
arXiv Detail & Related papers (2024-02-28T18:57:56Z) - Predicting mutational effects on protein-protein binding via a
side-chain diffusion probabilistic model [14.949807579474781]
We propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures.
SidechainDiff is the first diffusion-based generative model for side-chains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures.
arXiv Detail & Related papers (2023-10-30T15:23:42Z) - Predicting protein variants with equivariant graph neural networks [0.0]
We compare the abilities of equivariant graph neural networks (EGNNs) and sequence-based approaches to identify promising amino-acid mutations.
Our proposed structural approach achieves a competitive performance to sequence-based approaches while being trained on significantly fewer molecules.
arXiv Detail & Related papers (2023-06-21T12:44:52Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.