Mask prior-guided denoising diffusion improves inverse protein folding
- URL: http://arxiv.org/abs/2412.07815v1
- Date: Tue, 10 Dec 2024 09:10:28 GMT
- Title: Mask prior-guided denoising diffusion improves inverse protein folding
- Authors: Peizhen Bai, Filip Miljković, Xianyuan Liu, Leonardo De Maria, Rebecca Croasdale-Wood, Owen Rackham, Haiping Lu,
- Abstract summary: Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure.
We propose a framework that captures both structural and residue interactions for inverse protein folding.
MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise.
- Score: 3.1373465343833704
- License:
- Abstract: Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing significant potential and competitive performance. However, challenges remain in predicting highly uncertain regions, such as those with loops and disorders. To tackle such low-confidence residue prediction, we propose a \textbf{Ma}sk \textbf{p}rior-guided denoising \textbf{Diff}usion (\textbf{MapDiff}) framework that accurately captures both structural and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural and residue interactions, we develop a graph-based denoising network with a mask prior pre-training strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to improve uncertainty estimation. Evaluation on four challenging sequence design benchmarks shows that MapDiff significantly outperforms state-of-the-art methods. Furthermore, the in-silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.
Related papers
- Diffusion Model with Representation Alignment for Protein Inverse Folding [53.139837825588614]
Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure.
We propose a novel method that leverages diffusion models with representation alignment (DMRA)
In experiments, we conduct extensive evaluations on the CATH4.2 dataset to demonstrate that DMRA outperforms leading methods.
arXiv Detail & Related papers (2024-12-12T15:47:59Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Protein Conformation Generation via Force-Guided SE(3) Diffusion Models [48.48934625235448]
Deep generative modeling techniques have been employed to generate novel protein conformations.
We propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation.
arXiv Detail & Related papers (2024-03-21T02:44:08Z) - DiAMoNDBack: Diffusion-denoising Autoregressive Model for
Non-Deterministic Backmapping of C{\alpha} Protein Traces [0.0]
DiAMoNDBack is an autoregressive denoising diffusion probability model for non-Deterministic Backmapping.
We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set.
We make DiAMoNDBack publicly available as a free and open source Python package.
arXiv Detail & Related papers (2023-07-23T23:05:08Z) - Graph Denoising Diffusion for Inverse Protein Folding [15.06549999760776]
Inverse protein folding is challenging due to its inherent one-to-many mapping characteristic.
We propose a novel graph denoising diffusion model for inverse protein folding.
Our model achieves state-of-the-art performance over a set of popular baseline methods in sequence recovery.
arXiv Detail & Related papers (2023-06-29T09:55:30Z) - Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion
Trajectory Prediction [29.375830561817047]
Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences or structures.
We propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling.
We enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein.
arXiv Detail & Related papers (2023-01-28T02:48:20Z) - Protein structure generation via folding diffusion [16.12124223972183]
We present a new diffusion-based generative model that designs protein backbone structures.
We generate new structures by denoising from a random, unfolded state towards a stable folded structure.
As a useful resource, we release the first open-source and trained models for protein structure diffusion.
arXiv Detail & Related papers (2022-09-30T17:35:53Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.