Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning
- URL: http://arxiv.org/abs/2406.13112v2
- Date: Sat, 14 Sep 2024 03:54:46 GMT
- Title: Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning
- Authors: Peter Eastman, Benjamin P. Pritchard, John D. Chodera, Thomas E. Markland,
- Abstract summary: SPICE dataset is a collection of quantum chemistry calculations for training machine learning potentials.
We train a set of potential energy functions called Nutmeg on it.
- Score: 1.747623282473278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe version 2 of the SPICE dataset, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original dataset by adding much more sampling of chemical space and more data on non-covalent interactions. We train a set of potential energy functions called Nutmeg on it. They are based on the TensorNet architecture. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large scale charge distribution. Evaluation of the new models shows they do an excellent job of reproducing energy differences between conformations, even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories, and are fast enough to be useful for routine simulation of small molecules.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Zero Shot Molecular Generation via Similarity Kernels [0.6597195879147557]
We present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation.
SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules.
We also release an interactive web tool that allows users to generate structures with SiMGen online.
arXiv Detail & Related papers (2024-02-13T17:53:44Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation [32.66694406638287]
We propose a new joint 2D and 3D diffusion model (JODO) that generates molecules with atom types, formal charges, bond information, and 3D coordinates.
Our model can also be extended for inverse molecular design targeting single or multiple quantum properties.
arXiv Detail & Related papers (2023-05-21T04:49:53Z) - Hybrid Quantum Generative Adversarial Networks for Molecular Simulation
and Drug Discovery [13.544339314714902]
Current classical computational power falls inadequate to simulate any more than small molecules.
Tens of billions of dollars are spent every year in these research experiments.
Deep generative models for graph-structured data provide fresh perspective on the issue of chemical synthesis.
arXiv Detail & Related papers (2022-12-15T13:36:35Z) - SPICE, A Dataset of Drug-like Molecules and Peptides for Training
Machine Learning Potentials [1.7044177326714558]
We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins.
It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids.
It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions.
We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space.
arXiv Detail & Related papers (2022-09-21T23:02:59Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - End-to-End Differentiable Molecular Mechanics Force Field Construction [0.5269923665485903]
We propose an alternative approach that uses graph neural networks to perceive chemical environments.
The entire process is modular and end-to-end differentiable with respect to model parameters.
We show that this approach is not only sufficiently to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields.
arXiv Detail & Related papers (2020-10-02T20:59:46Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.