Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures
- URL: http://arxiv.org/abs/2408.12413v3
- Date: Wed, 18 Sep 2024 10:53:11 GMT
- Title: Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures
- Authors: Ce Liu, Jun Wang, Zhiqiang Cai, Yingxu Wang, Huizhen Kuang, Kaihui Cheng, Liwei Zhang, Qingkun Su, Yining Tang, Fenglei Cao, Limei Han, Siyu Zhu, Yuan Qi,
- Abstract summary: We introduce a large-scale dataset, Dynamic PDB, encompassing approximately 12.6K proteins.
We provide a comprehensive suite of physical properties, including atomic velocities and forces, potential and kinetic energies, and the temperature of the simulation environment.
For benchmarking purposes, we evaluate state-of-the-art methods on the proposed dataset for the task of trajectory prediction.
- Score: 15.819618708991598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D protein structural databases, such as the Protein Data Bank (PDB), by integrating dynamic data and additional physical properties. Specifically, we introduce a large-scale dataset, Dynamic PDB, encompassing approximately 12.6K proteins, each subjected to all-atom molecular dynamics (MD) simulations lasting 1 microsecond to capture conformational changes. Furthermore, we provide a comprehensive suite of physical properties, including atomic velocities and forces, potential and kinetic energies of proteins, and the temperature of the simulation environment, recorded at 1 picosecond intervals throughout the simulations. For benchmarking purposes, we evaluate state-of-the-art methods on the proposed dataset for the task of trajectory prediction. To demonstrate the value of integrating richer physical properties in the study of protein dynamics and related model design, we base our approach on the SE(3) diffusion model and incorporate these physical properties into the trajectory prediction process. Preliminary results indicate that this straightforward extension of the SE(3) model yields improved accuracy, as measured by MAE and RMSD, when the proposed physical properties are taken into consideration. https://fudan-generative-vision.github.io/dynamicPDB/ .
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - EquiJump: Protein Dynamics Simulation via SO(3)-Equivariant Stochastic Interpolants [13.493198442811865]
We introduce EquiJump, a transferable SO(3)-equivariant model that bridges all-atom protein dynamics simulation time steps directly.
Our approach achieves diverse sampling methods and is benchmarked against existing models on trajectory data of fast folding proteins.
arXiv Detail & Related papers (2024-10-12T23:22:49Z) - A DNN Biophysics Model with Topological and Electrostatic Features [0.0]
The model uses multi-scale and uniform topological and electrostatic features generated with protein structural information and force field.
The machine learning simulation on over 4000 protein structures shows the efficiency and fidelity of these features.
This model shows its potential as a general tool in assisting biophysical properties and function prediction for the broad biomolecules.
arXiv Detail & Related papers (2024-09-05T16:11:40Z) - AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance [18.90451943620277]
This study introduces an innovative 4D diffusion model incorporating molecular dynamics (MD) simulation data to learn dynamic protein structures.
Our model exhibits high accuracy in predicting dynamic 3D structures of proteins containing up to 256 amino acids over 32 time steps.
arXiv Detail & Related papers (2024-08-22T14:12:50Z) - Protein Conformation Generation via Force-Guided SE(3) Diffusion Models [48.48934625235448]
Deep generative modeling techniques have been employed to generate novel protein conformations.
We propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation.
arXiv Detail & Related papers (2024-03-21T02:44:08Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Dynamic Molecular Graph-based Implementation for Biophysical Properties
Prediction [9.112532782451233]
We propose a novel approach based on the transformer model utilizing GNNs for characterizing dynamic features of protein-ligand interactions.
Our message passing transformer pre-trains on a set of molecular dynamic data based off of physics-based simulations to learn coordinate construction and make binding probability and affinity predictions.
arXiv Detail & Related papers (2022-12-20T04:21:19Z) - Protein Structure and Sequence Generation with Equivariant Denoising
Diffusion Probabilistic Models [3.5450828190071646]
An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions.
We introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches.
arXiv Detail & Related papers (2022-05-26T16:10:09Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.