Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling
- URL: http://arxiv.org/abs/2306.03117v3
- Date: Mon, 11 Mar 2024 19:54:30 GMT
- Title: Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling
- Authors: Jiarui Lu, Bozitao Zhong, Zuobai Zhang, Jian Tang
- Abstract summary: The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
- Score: 23.74897713386661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dynamic nature of proteins is crucial for determining their biological
functions and properties, for which Monte Carlo (MC) and molecular dynamics
(MD) simulations stand as predominant tools to study such phenomena. By
utilizing empirically derived force fields, MC or MD simulations explore the
conformational space through numerically evolving the system via Markov chain
or Newtonian mechanics. However, the high-energy barrier of the force fields
can hamper the exploration of both methods by the rare event, resulting in
inadequately sampled ensemble without exhaustive running. Existing
learning-based approaches perform direct sampling yet heavily rely on
target-specific simulation data for training, which suffers from high data
acquisition cost and poor generalizability. Inspired by simulated annealing, we
propose Str2Str, a novel structure-to-structure translation framework capable
of zero-shot conformation sampling with roto-translation equivariant property.
Our method leverages an amortized denoising score matching objective trained on
general crystal structures and has no reliance on simulation data during both
training and inference. Experimental results across several benchmarking
protein systems demonstrate that Str2Str outperforms previous state-of-the-art
generative structure prediction models and can be orders of magnitude faster
compared to long MD simulations. Our open-source implementation is available at
https://github.com/lujiarui/Str2Str
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Fusing Neural and Physical: Augment Protein Conformation Sampling with
Tractable Simulations [27.984190594059868]
generative models have been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster.
In this work, we explore the few-shot setting of such pre-trained generative sampler which incorporates MD simulations in a tractable manner.
arXiv Detail & Related papers (2024-02-16T03:48:55Z) - A Multi-Grained Symmetric Differential Equation Model for Learning
Protein-Ligand Binding Dynamics [74.93549765488103]
In drug discovery, molecular dynamics simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding.
We show the efficiency and effectiveness of NeuralMD, with a 2000$times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation [75.14793516745374]
We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data.
Our experiments show that our method imparts the desired inductive bias, resulting in better few-shot learning for FST-like tasks.
arXiv Detail & Related papers (2023-10-01T21:19:12Z) - Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of
Protein Simulators [29.22292758901411]
We adapt the soft prompt-based learning method to molecular dynamics tasks.
Our framework excels in accuracy for in-domain data and demonstrates strong generalization capabilities for unseen and out-of-distribution samples.
arXiv Detail & Related papers (2023-08-29T08:29:08Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - DagSim: Combining DAG-based model structure with unconstrained data
types and relations for flexible, transparent, and modularized data
simulation [2.685173014586162]
We present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations.
A succinct YAML format for defining the simulation model structure promotes transparency.
We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences.
arXiv Detail & Related papers (2022-05-06T17:43:27Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z) - Efficient Characterization of Dynamic Response Variation Using
Multi-Fidelity Data Fusion through Composite Neural Network [9.446974144044733]
We take advantage of the multi-level response prediction opportunity in structural dynamic analysis.
We formulate a composite neural network fusion approach that can fully utilize the multi-level, heterogeneous datasets obtained.
arXiv Detail & Related papers (2020-05-07T02:44:03Z) - Using Machine Learning Approach for Computational Substructure in
Real-Time Hybrid Simulation [1.0323063834827415]
Hybrid simulation (HS) is a widely used structural testing method that combines a computational substructure with a numerical model for well-understood components.
One challenge for fast HS or real-time HS is associated with the analytical substructures of relatively complex structures.
In this study, a metamodeling technique is proposed to represent the structural dynamic behavior of the analytical substructure.
arXiv Detail & Related papers (2020-04-04T22:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.