Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of
Protein Simulators
- URL: http://arxiv.org/abs/2308.15116v3
- Date: Tue, 10 Oct 2023 03:41:05 GMT
- Title: Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of
Protein Simulators
- Authors: Jingbang Chen, Yian Wang, Xingwei Qu, Shuangjia Zheng, Yaodong Yang,
Hao Dong, Jie Fu
- Abstract summary: We adapt the soft prompt-based learning method to molecular dynamics tasks.
Our framework excels in accuracy for in-domain data and demonstrates strong generalization capabilities for unseen and out-of-distribution samples.
- Score: 29.22292758901411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular dynamics simulations have emerged as a fundamental instrument for
studying biomolecules. At the same time, it is desirable to perform simulations
of a collection of particles under various conditions in which the molecules
can fluctuate. In this paper, we explore and adapt the soft prompt-based
learning method to molecular dynamics tasks. Our model can remarkably
generalize to unseen and out-of-distribution scenarios with limited training
data. While our work focuses on temperature as a test case, the versatility of
our approach allows for efficient simulation through any continuous dynamic
conditions, such as pressure and volumes. Our framework has two stages: 1)
Pre-trains with data mixing technique, augments molecular structure data and
temperature prompts, then applies a curriculum learning method by increasing
the ratio of them smoothly. 2) Meta-learning-based fine-tuning framework
improves sample-efficiency of fine-tuning process and gives the soft
prompt-tuning better initialization points. Comprehensive experiments reveal
that our framework excels in accuracy for in-domain data and demonstrates
strong generalization capabilities for unseen and out-of-distribution samples.
Related papers
- Generative Modeling of Molecular Dynamics Trajectories [12.255021091552441]
We introduce generative modeling of molecular trajectories as a paradigm for learning flexible multi-task surrogate models of MD from data.
We show such generative models can be adapted to diverse tasks such as forward simulation, transition path sampling, and trajectory upsampling.
arXiv Detail & Related papers (2024-09-26T13:02:28Z) - Fusing Neural and Physical: Augment Protein Conformation Sampling with
Tractable Simulations [27.984190594059868]
generative models have been leveraged as a surrogate sampler to obtain conformation ensembles with orders of magnitude faster.
In this work, we explore the few-shot setting of such pre-trained generative sampler which incorporates MD simulations in a tractable manner.
arXiv Detail & Related papers (2024-02-16T03:48:55Z) - A Multi-Grained Symmetric Differential Equation Model for Learning
Protein-Ligand Binding Dynamics [74.93549765488103]
In drug discovery, molecular dynamics simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding.
We show the efficiency and effectiveness of NeuralMD, with a 2000$times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - Top-down machine learning of coarse-grained protein force-fields [2.1485350418225244]
Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential.
Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data.
By applying Markov State Models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations.
arXiv Detail & Related papers (2023-06-20T08:31:24Z) - Str2Str: A Score-based Framework for Zero-shot Protein Conformation
Sampling [23.74897713386661]
The dynamic nature of proteins is crucial for determining their biological functions and properties.
Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training.
We propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling.
arXiv Detail & Related papers (2023-06-05T15:19:06Z) - Calibration and generalizability of probabilistic models on low-data
chemical datasets with DIONYSUS [0.0]
We perform an extensive study of the calibration and generalizability of probabilistic machine learning models on small chemical datasets.
We analyse the quality of their predictions and uncertainties in a variety of tasks (binary, regression) and datasets.
We offer practical insights into model and feature choice for modelling small chemical datasets, a common scenario in new chemical experiments.
arXiv Detail & Related papers (2022-12-03T08:19:06Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.