Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression
- URL: http://arxiv.org/abs/2505.17478v1
- Date: Fri, 23 May 2025 05:00:15 GMT
- Title: Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression
- Authors: Yuning Shen, Lihao Wang, Huizhuo Yuan, Yan Wang, Bangji Yang, Quanquan Gu,
- Abstract summary: ConfRover is an autoregressive model that simultaneously learns protein conformation and dynamics from MD trajectories.<n>It supports both time-dependent and time-independent sampling.<n>Experiments on ATLAS, a large-scale protein MD dataset, demonstrate the effectiveness of our model.
- Score: 45.49904590474368
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding protein dynamics is critical for elucidating their biological functions. The increasing availability of molecular dynamics (MD) data enables the training of deep generative models to efficiently explore the conformational space of proteins. However, existing approaches either fail to explicitly capture the temporal dependencies between conformations or do not support direct generation of time-independent samples. To address these limitations, we introduce ConfRover, an autoregressive model that simultaneously learns protein conformation and dynamics from MD trajectories, supporting both time-dependent and time-independent sampling. At the core of our model is a modular architecture comprising: (i) an encoding layer, adapted from protein folding models, that embeds protein-specific information and conformation at each time frame into a latent space; (ii) a temporal module, a sequence model that captures conformational dynamics across frames; and (iii) an SE(3) diffusion model as the structure decoder, generating conformations in continuous space. Experiments on ATLAS, a large-scale protein MD dataset of diverse structures, demonstrate the effectiveness of our model in learning conformational dynamics and supporting a wide range of downstream tasks. ConfRover is the first model to sample both protein conformations and trajectories within a single framework, offering a novel and flexible approach for learning from protein MD data.
Related papers
- MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space [0.0]
MoDyGAN is a pipeline that exploits molecular dynamics simulations and generative adversarial networks (GANs) to explore protein conformational spaces.<n>MoDyGAN contains a generator that maps Gaussian distributions into MD-derived protein trajectories, and a refinement module that combines ensemble learning with a dual-discriminator.<n>Central to our approach is an innovative representation technique that reversibly transforms 3D protein structures into 2D matrices.<n>Our results suggest that representing proteins as image-like data unlocks new possibilities for applying advanced deep learning techniques to biomolecular simulation.
arXiv Detail & Related papers (2025-07-18T14:18:28Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.<n>Deep generative models have shown promise in generating protein conformations as a more efficient alternative.<n>We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - EquiJump: Protein Dynamics Simulation via SO(3)-Equivariant Stochastic Interpolants [13.493198442811865]
We introduce EquiJump, a transferable SO(3)-equivariant model that bridges all-atom protein dynamics simulation time steps directly.<n>Our approach achieves diverse sampling methods and is benchmarked against existing models on trajectory data of fast folding proteins.
arXiv Detail & Related papers (2024-10-12T23:22:49Z) - AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance [18.90451943620277]
This study introduces an innovative 4D diffusion model incorporating molecular dynamics (MD) simulation data to learn dynamic protein structures.<n>Our model exhibits high accuracy in predicting dynamic 3D structures of proteins containing up to 256 amino acids over 32 time steps.
arXiv Detail & Related papers (2024-08-22T14:12:50Z) - Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures [15.819618708991598]
We introduce a large-scale dataset, Dynamic PDB, encompassing approximately 12.6K proteins.
We provide a comprehensive suite of physical properties, including atomic velocities and forces, potential and kinetic energies, and the temperature of the simulation environment.
For benchmarking purposes, we evaluate state-of-the-art methods on the proposed dataset for the task of trajectory prediction.
arXiv Detail & Related papers (2024-08-22T14:06:01Z) - Progressive Multi-Modality Learning for Inverse Protein Folding [47.095862120116976]
We propose a novel protein design paradigm called MMDesign, which leverages multi-modality transfer learning.
MMDesign is the first framework that combines a pretrained structural module with a pretrained contextual module, using an auto-encoder (AE) based language model to incorporate prior protein semantic knowledge.
Experimental results, only training with the small dataset, demonstrate that MMDesign consistently outperforms baselines on various public benchmarks.
arXiv Detail & Related papers (2023-12-11T10:59:23Z) - Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical
Coarse-graining SO(3)-Equivariant Autoencoders [1.8835495377767553]
Three-dimensional native states of natural proteins display recurring and hierarchical patterns.
Traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution.
We introduce Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all-atom protein structures.
arXiv Detail & Related papers (2023-10-04T01:01:11Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.