The Latent Road to Atoms: Backmapping Coarse-grained Protein Structures with Latent Diffusion
- URL: http://arxiv.org/abs/2410.13264v1
- Date: Thu, 17 Oct 2024 06:38:07 GMT
- Title: The Latent Road to Atoms: Backmapping Coarse-grained Protein Structures with Latent Diffusion
- Authors: Xu Han, Yuancheng Sun, Kai Chen, Kang Liu, Qiwei Ye,
- Abstract summary: Latent Diffusion Backmapping (LDB) is a novel approach leveraging denoising diffusion within latent space to address these challenges.
We evaluate LDB's state-of-the-art performance on three distinct protein datasets.
Our results position LDB as a powerful and scalable approach for backmapping, effectively bridging the gap between CG simulations and atomic-level analyses in computational biology.
- Score: 19.85659309869674
- License:
- Abstract: Coarse-grained(CG) molecular dynamics simulations offer computational efficiency for exploring protein conformational ensembles and thermodynamic properties. Though coarse representations enable large-scale simulations across extended temporal and spatial ranges, the sacrifice of atomic-level details limits their utility in tasks such as ligand docking and protein-protein interaction prediction. Backmapping, the process of reconstructing all-atom structures from coarse-grained representations, is crucial for recovering these fine details. While recent machine learning methods have made strides in protein structure generation, challenges persist in reconstructing diverse atomistic conformations that maintain geometric accuracy and chemical validity. In this paper, we present Latent Diffusion Backmapping (LDB), a novel approach leveraging denoising diffusion within latent space to address these challenges. By combining discrete latent encoding with diffusion, LDB bypasses the need for equivariant and internal coordinate manipulation, significantly simplifying the training and sampling processes as well as facilitating better and wider exploration in configuration space. We evaluate LDB's state-of-the-art performance on three distinct protein datasets, demonstrating its ability to efficiently reconstruct structures with high structural accuracy and chemical validity. Moreover, LDB shows exceptional versatility in capturing diverse protein ensembles, highlighting its capability to explore intricate conformational spaces. Our results position LDB as a powerful and scalable approach for backmapping, effectively bridging the gap between CG simulations and atomic-level analyses in computational biology.
Related papers
- Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Deep Signature: Characterization of Large-Scale Molecular Dynamics [29.67824486345836]
Deep Signature is a novel computationally tractable framework that characterizes complex dynamics and interatomic interactions.
Our approach incorporates soft spectral clustering that locally aggregates cooperative dynamics to reduce the size of the system, as well as signature transform to provide a global characterization of the non-smooth interactive dynamics.
arXiv Detail & Related papers (2024-10-03T16:37:48Z) - Protein Conformation Generation via Force-Guided SE(3) Diffusion Models [48.48934625235448]
Deep generative modeling techniques have been employed to generate novel protein conformations.
We propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation.
arXiv Detail & Related papers (2024-03-21T02:44:08Z) - DiAMoNDBack: Diffusion-denoising Autoregressive Model for
Non-Deterministic Backmapping of C{\alpha} Protein Traces [0.0]
DiAMoNDBack is an autoregressive denoising diffusion probability model for non-Deterministic Backmapping.
We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set.
We make DiAMoNDBack publicly available as a free and open source Python package.
arXiv Detail & Related papers (2023-07-23T23:05:08Z) - Chemically Transferable Generative Backmapping of Coarse-Grained
Proteins [0.0]
Coarse-graining (CG) accelerates simulations of protein dynamics by simulating sets of atoms as singular beads.
Backmapping is the opposite operation of bringing lost atomistic details back from the CG representation.
This work builds a fast, transferable, and reliable generative backmapping tool for CG protein representations.
arXiv Detail & Related papers (2023-03-02T20:51:57Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.