Related papers: DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces

DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces

URL: http://arxiv.org/abs/2307.12451v1
Date: Sun, 23 Jul 2023 23:05:08 GMT
Title: DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces
Authors: Michael S. Jones and Kirill Shmilovich and Andrew L. Ferguson
Abstract summary: DiAMoNDBack is an autoregressive denoising diffusion probability model for non-Deterministic Backmapping. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set. We make DiAMoNDBack publicly available as a free and open source Python package.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package.

Related papers

ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning [49.2607661375311]
We present ProteinZero, a novel framework that enables computationally scalable, automated, and continuous self-improvement of the inverse folding model.<n>ProteinZero substantially outperforms existing methods across every key metric in protein design.<n> Notably, the entire RL run on CATH-4.3 can be done with a single 8 X GPU node in under 3 days, including reward.
arXiv Detail & Related papers (2025-06-09T06:08:59Z)
Energy-Based Coarse-Graining in Molecular Dynamics: A Flow-Based Framework Without Data [0.0]
We introduce a data-free generative framework for coarse-graining that directly targets the all-atom Boltzmann distribution. A potentially learnable, bijective map from the full latent space to the all-atom configuration space enables automatic and accurate reconstruction of molecular structures.
arXiv Detail & Related papers (2025-04-29T17:05:27Z)
Learning conformational ensembles of proteins based on backbone geometry [1.1874952582465603]
We propose a flow matching model for sampling protein conformations based solely on backbone geometry. The resulting model is orders of magnitudes faster than current state-of-the-art approaches at comparable accuracy and can be trained from scratch in a few GPU days.
arXiv Detail & Related papers (2025-02-19T17:16:27Z)
Diffusion Model with Representation Alignment for Protein Inverse Folding [53.139837825588614]
Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure. We propose a novel method that leverages diffusion models with representation alignment (DMRA) In experiments, we conduct extensive evaluations on the CATH4.2 dataset to demonstrate that DMRA outperforms leading methods.
arXiv Detail & Related papers (2024-12-12T15:47:59Z)
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models. It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features. Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z)
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design [56.957070405026194]
We propose an algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models. DRAKES can generate sequences that are both natural-like and yield high rewards.
arXiv Detail & Related papers (2024-10-17T15:10:13Z)
The Latent Road to Atoms: Backmapping Coarse-grained Protein Structures with Latent Diffusion [19.85659309869674]
Latent Diffusion Backmapping (LDB) is a novel approach leveraging denoising diffusion within latent space to address these challenges. We evaluate LDB's state-of-the-art performance on three distinct protein datasets. Our results position LDB as a powerful and scalable approach for backmapping, effectively bridging the gap between CG simulations and atomic-level analyses in computational biology.
arXiv Detail & Related papers (2024-10-17T06:38:07Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Protein Conformation Generation via Force-Guided SE(3) Diffusion Models [48.48934625235448]
Deep generative modeling techniques have been employed to generate novel protein conformations. We propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation.
arXiv Detail & Related papers (2024-03-21T02:44:08Z)
Navigating protein landscapes with a machine-learned transferable coarse-grained model [29.252004942896875]
coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. We develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins.
arXiv Detail & Related papers (2023-10-27T17:10:23Z)
Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical Coarse-graining SO(3)-Equivariant Autoencoders [1.8835495377767553]
Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution. We introduce Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all-atom protein structures.
arXiv Detail & Related papers (2023-10-04T01:01:11Z)
Chemically Transferable Generative Backmapping of Coarse-Grained Proteins [0.0]
Coarse-graining (CG) accelerates simulations of protein dynamics by simulating sets of atoms as singular beads. Backmapping is the opposite operation of bringing lost atomistic details back from the CG representation. This work builds a fast, transferable, and reliable generative backmapping tool for CG protein representations.
arXiv Detail & Related papers (2023-03-02T20:51:57Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
Learning Geometrically Disentangled Representations of Protein Folding Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
Sequence-guided protein structure determination using graph convolutional and recurrent networks [0.0]
Single particle, cryogenic electron microscopy (cryo-EM) experiments now routinely produce high-resolution data for large proteins. Existing protocols for this type of task often rely on significant human intervention and can take hours to many days to produce an output. Here, we present a fully automated, template-free model building approach that is based entirely on neural networks.
arXiv Detail & Related papers (2020-07-14T06:24:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.