$\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials
- URL: http://arxiv.org/abs/2406.14347v1
- Date: Thu, 20 Jun 2024 14:14:59 GMT
- Title: $\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials
- Authors: Kuzma Khrabrov, Anton Ber, Artem Tsypin, Konstantin Ushenin, Egor Rumiantsev, Alexander Telepov, Dmitry Protasov, Ilya Shenbin, Anton Alekseev, Mikhail Shirokikh, Sergey Nikolenko, Elena Tutubalina, Artur Kadurin,
- Abstract summary: This work presents a new dataset and benchmark called $nabla2$DFT that is based on the nablaDFT.
It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models.
$nabla2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules.
- Score: 35.949502493236146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called $\nabla^2$DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level ($\omega$B97X-D/def2-SVP) for each conformation. Moreover, $\nabla^2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.
Related papers
- Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage [16.745564099126575]
We develop ByteFF, an Amber-compatible force field for drug-like molecules.
Our model predicts all bonded and non-bonded MM force field parameters for drug-like molecules simultaneously across a broad chemical space.
arXiv Detail & Related papers (2024-08-23T03:37:06Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Molecular Geometry-aware Transformer for accurate 3D Atomic System
modeling [51.83761266429285]
We propose a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them.
Moleformer achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties.
arXiv Detail & Related papers (2023-02-02T03:49:57Z) - SPICE, A Dataset of Drug-like Molecules and Peptides for Training
Machine Learning Potentials [1.7044177326714558]
We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins.
It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids.
It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions.
We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space.
arXiv Detail & Related papers (2022-09-21T23:02:59Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - On the equivalence of molecular graph convolution and molecular wave
function with poor basis set [7.106986689736826]
We describe the quantum deep field (QDF), a machine learning model based on an underlying quantum physics.
For molecular energy prediction tasks, we demonstrated the viability of an extrapolation,'' in which we trained a QDF model with small molecules, tested it with large molecules, and achieved high performance.
arXiv Detail & Related papers (2020-11-16T13:20:35Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - End-to-End Differentiable Molecular Mechanics Force Field Construction [0.5269923665485903]
We propose an alternative approach that uses graph neural networks to perceive chemical environments.
The entire process is modular and end-to-end differentiable with respect to model parameters.
We show that this approach is not only sufficiently to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields.
arXiv Detail & Related papers (2020-10-02T20:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.