SPICE, A Dataset of Drug-like Molecules and Peptides for Training
Machine Learning Potentials
- URL: http://arxiv.org/abs/2209.10702v1
- Date: Wed, 21 Sep 2022 23:02:59 GMT
- Title: SPICE, A Dataset of Drug-like Molecules and Peptides for Training
Machine Learning Potentials
- Authors: Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas
Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin
P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, Thomas E. Markland
- Abstract summary: We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins.
It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids.
It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions.
We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space.
- Score: 1.7044177326714558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning potentials are an important tool for molecular simulation,
but their development is held back by a shortage of high quality datasets to
train them on. We describe the SPICE dataset, a new quantum chemistry dataset
for training potentials relevant to simulating drug-like small molecules
interacting with proteins. It contains over 1.1 million conformations for a
diverse set of small molecules, dimers, dipeptides, and solvated amino acids.
It includes 15 elements, charged and uncharged molecules, and a wide range of
covalent and non-covalent interactions. It provides both forces and energies
calculated at the {\omega}B97M-D3(BJ)/def2-TZVPPD level of theory, along with
other useful quantities such as multipole moments and bond orders. We train a
set of machine learning potentials on it and demonstrate that they can achieve
chemical accuracy across a broad region of chemical space. It can serve as a
valuable resource for the creation of transferable, ready to use potential
functions for use in molecular simulations.
Related papers
- Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage [16.745564099126575]
We develop ByteFF, an Amber-compatible force field for drug-like molecules.
Our model predicts all bonded and non-bonded MM force field parameters for drug-like molecules simultaneously across a broad chemical space.
arXiv Detail & Related papers (2024-08-23T03:37:06Z) - $\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials [35.949502493236146]
This work presents a new dataset and benchmark called $nabla2$DFT that is based on the nablaDFT.
It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models.
$nabla2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules.
arXiv Detail & Related papers (2024-06-20T14:14:59Z) - Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning [1.747623282473278]
SPICE dataset is a collection of quantum chemistry calculations for training machine learning potentials.
We train a set of potential energy functions called Nutmeg on it.
arXiv Detail & Related papers (2024-06-18T23:54:21Z) - Machine-learned molecular mechanics force field for the simulation of
protein-ligand systems and beyond [33.54862439531144]
Development of reliable and molecular mechanics (MM) force fields is indispensable for biomolecular simulation and computer-aided drug design.
We introduce a generalized and machine-learned MM force field, ttexttespaloma-0.3, and an end-to-end differentiable framework using graph neural networks.
The force field reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids.
arXiv Detail & Related papers (2023-07-13T23:00:22Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Computing molecular excited states on a D-Wave quantum annealer [52.5289706853773]
We demonstrate the use of a D-Wave quantum annealer for the calculation of excited electronic states of molecular systems.
These simulations play an important role in a number of areas, such as photovoltaics, semiconductor technology and nanoscience.
arXiv Detail & Related papers (2021-07-01T01:02:17Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - End-to-End Differentiable Molecular Mechanics Force Field Construction [0.5269923665485903]
We propose an alternative approach that uses graph neural networks to perceive chemical environments.
The entire process is modular and end-to-end differentiable with respect to model parameters.
We show that this approach is not only sufficiently to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields.
arXiv Detail & Related papers (2020-10-02T20:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.