Related papers: Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

URL: http://arxiv.org/abs/2306.03117v3
Date: Mon, 11 Mar 2024 19:54:30 GMT
Title: Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling
Authors: Jiarui Lu, Bozitao Zhong, Zuobai Zhang, Jian Tang
Abstract summary: The dynamic nature of proteins is crucial for determining their biological functions and properties. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training. We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
Score: 23.74897713386661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics. However, the high-energy barrier of the force fields can hamper the exploration of both methods by the rare event, resulting in inadequately sampled ensemble without exhaustive running. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training, which suffers from high data acquisition cost and poor generalizability. Inspired by simulated annealing, we propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property. Our method leverages an amortized denoising score matching objective trained on general crystal structures and has no reliance on simulation data during both training and inference. Experimental results across several benchmarking protein systems demonstrate that Str2Str outperforms previous state-of-the-art generative structure prediction models and can be orders of magnitude faster compared to long MD simulations. Our open-source implementation is available at https://github.com/lujiarui/Str2Str

Related papers

Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture [0.0]
We introduce a dataset based on PFM simulations designed to benchmark and advance ML methods for fracture modeling.<n>This dataset includes three energy decomposition methods, two boundary conditions, and 1,000 random initial crack configurations for a total of 6,000 simulations.<n>Our results highlight both the promise and limitations of popular current models, and demonstrate the utility of this dataset as a testbed for advancing machine learning in fracture mechanics research.
arXiv Detail & Related papers (2025-07-09T19:14:56Z)
G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration [48.948187359727996]
G-Sim is a hybrid framework that automates simulator construction with rigorous empirical calibration.<n>It produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions.
arXiv Detail & Related papers (2025-06-10T22:14:34Z)
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels. We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter. In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Fusing Neural and Physical: Augment Protein Conformation Sampling with Tractable Simulations [27.984190594059868]
generative models have been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster. In this work, we explore the few-shot setting of such pre-trained generative sampler which incorporates MD simulations in a tractable manner.
arXiv Detail & Related papers (2024-02-16T03:48:55Z)
A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics [73.35846234413611]
In drug discovery, molecular dynamics (MD) simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. We propose NeuralMD, the first machine learning (ML) surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding dynamics. We demonstrate the efficiency and effectiveness of NeuralMD, achieving over 1K$times$ speedup compared to standard numerical MD simulations.
arXiv Detail & Related papers (2024-01-26T09:35:17Z)
SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation [75.14793516745374]
We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data. Our experiments show that our method imparts the desired inductive bias, resulting in better few-shot learning for FST-like tasks.
arXiv Detail & Related papers (2023-10-01T21:19:12Z)
Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators [29.22292758901411]
We adapt the soft prompt-based learning method to molecular dynamics tasks. Our framework excels in accuracy for in-domain data and demonstrates strong generalization capabilities for unseen and out-of-distribution samples.
arXiv Detail & Related papers (2023-08-29T08:29:08Z)
Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs. We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling. We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z)
DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation [2.685173014586162]
We present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations. A succinct YAML format for defining the simulation model structure promotes transparency. We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences.
arXiv Detail & Related papers (2022-05-06T17:43:27Z)
Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z)
Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network [9.446974144044733]
We take advantage of the multi-level response prediction opportunity in structural dynamic analysis. We formulate a composite neural network fusion approach that can fully utilize the multi-level, heterogeneous datasets obtained.
arXiv Detail & Related papers (2020-05-07T02:44:03Z)
Using Machine Learning Approach for Computational Substructure in Real-Time Hybrid Simulation [1.0323063834827415]
Hybrid simulation (HS) is a widely used structural testing method that combines a computational substructure with a numerical model for well-understood components. One challenge for fast HS or real-time HS is associated with the analytical substructures of relatively complex structures. In this study, a metamodeling technique is proposed to represent the structural dynamic behavior of the analytical substructure.
arXiv Detail & Related papers (2020-04-04T22:22:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.