Backdiff: a diffusion model for generalized transferable protein
backmapping
- URL: http://arxiv.org/abs/2310.01768v2
- Date: Wed, 29 Nov 2023 03:43:56 GMT
- Title: Backdiff: a diffusion model for generalized transferable protein
backmapping
- Authors: Yikai Liu, Ming Chen, Guang Lin
- Abstract summary: BackDiff is a new generative model designed to achieve generalization and reliability in the protein backmapping problem.
Our method facilitates end-to-end training and allows efficient sampling across different proteins and diverse CG models without the need for retraining.
- Score: 9.815461018844523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Coarse-grained (CG) models play a crucial role in the study of protein
structures, protein thermodynamic properties, and protein conformation
dynamics. Due to the information loss in the coarse-graining process,
backmapping from CG to all-atom configurations is essential in many protein
design and drug discovery applications when detailed atomic representations are
needed for in-depth studies. Despite recent progress in data-driven backmapping
approaches, devising a backmapping method that can be universally applied
across various CG models and proteins remains unresolved. In this work, we
propose BackDiff, a new generative model designed to achieve generalization and
reliability in the protein backmapping problem. BackDiff leverages the
conditional score-based diffusion model with geometric representations. Since
different CG models can contain different coarse-grained sites which include
selected atoms (CG atoms) and simple CG auxiliary functions of atomistic
coordinates (CG auxiliary variables), we design a self-supervised training
framework to adapt to different CG atoms, and constrain the diffusion sampling
paths with arbitrary CG auxiliary variables as conditions. Our method
facilitates end-to-end training and allows efficient sampling across different
proteins and diverse CG models without the need for retraining. Comprehensive
experiments over multiple popular CG models demonstrate BackDiff's superior
performance to existing state-of-the-art approaches, and generalization and
flexibility that these approaches cannot achieve. A pretrained BackDiff model
can offer a convenient yet reliable plug-and-play solution for protein
researchers, enabling them to investigate further from their own CG models.
Related papers
- HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis [59.25751939710903]
We propose a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos.<n>Our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately.<n>Results highlight a practical and scalable solution for EVS in real-world scenarios.
arXiv Detail & Related papers (2025-06-24T03:54:40Z) - An Iterative Framework for Generative Backmapping of Coarse Grained Proteins [0.6990493129893112]
We introduce a novel iterative framework by using conditional Variational Autoencoders and graph-based neural networks.<n>We outline the theory of iterative generative backmapping and demonstrate via numerical experiments the advantages of multistep schemes.<n>This multistep approach not only improves the accuracy of reconstructions but also makes the training process more computationally efficient for proteins with ultra-CG representations.
arXiv Detail & Related papers (2025-05-23T16:40:25Z) - Multi-Scale Representation Learning for Protein Fitness Prediction [31.735234482320283]
Previous methods have primarily relied on self-supervised models trained on vast, unlabeled protein sequence or structure datasets.
We introduce the Sequence-Structure-Surface Fitness (S3F) model - a novel multimodal representation learning framework that integrates protein features across several scales.
Our approach combines sequence representations from a protein language model with Geometric Vector Perceptron networks encoding protein backbone and detailed surface topology.
arXiv Detail & Related papers (2024-12-02T04:28:10Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Deep Equilibrium Diffusion Restoration with Parallel Sampling [120.15039525209106]
Diffusion model-based image restoration (IR) aims to use diffusion models to recover high-quality (HQ) images from degraded images, achieving promising performance.
Most existing methods need long serial sampling chains to restore HQ images step-by-step, resulting in expensive sampling time and high computation costs.
In this work, we aim to rethink the diffusion model-based IR models through a different perspective, i.e., a deep equilibrium (DEQ) fixed point system, called DeqIR.
arXiv Detail & Related papers (2023-11-20T08:27:56Z) - Navigating protein landscapes with a machine-learned transferable
coarse-grained model [29.252004942896875]
coarse-grained (CG) model with similar prediction performance has been a long-standing challenge.
We develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences.
We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins.
arXiv Detail & Related papers (2023-10-27T17:10:23Z) - DiAMoNDBack: Diffusion-denoising Autoregressive Model for
Non-Deterministic Backmapping of C{\alpha} Protein Traces [0.0]
DiAMoNDBack is an autoregressive denoising diffusion probability model for non-Deterministic Backmapping.
We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set.
We make DiAMoNDBack publicly available as a free and open source Python package.
arXiv Detail & Related papers (2023-07-23T23:05:08Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Generative Pretrained Autoregressive Transformer Graph Neural Network
applied to the Analysis and Discovery of Novel Proteins [0.0]
We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling.
The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks.
We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance.
arXiv Detail & Related papers (2023-05-07T12:30:24Z) - Chemically Transferable Generative Backmapping of Coarse-Grained
Proteins [0.0]
Coarse-graining (CG) accelerates simulations of protein dynamics by simulating sets of atoms as singular beads.
Backmapping is the opposite operation of bringing lost atomistic details back from the CG representation.
This work builds a fast, transferable, and reliable generative backmapping tool for CG protein representations.
arXiv Detail & Related papers (2023-03-02T20:51:57Z) - Latent Space Diffusion Models of Cryo-EM Structures [6.968705314671148]
We train a diffusion model as an expressive, learnable prior in the cryoDRGN framework.
By learning an accurate model of the data distribution, our method unlocks tools in generative modeling, sampling, and distribution analysis.
arXiv Detail & Related papers (2022-11-25T15:17:10Z) - GeoDiff: a Geometric Diffusion Model for Molecular Conformation
Generation [102.85440102147267]
We propose a novel generative model named GeoDiff for molecular conformation prediction.
We show that GeoDiff is superior or comparable to existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-06T09:47:01Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.