EDBench: Large-Scale Electron Density Data for Molecular Modeling
- URL: http://arxiv.org/abs/2505.09262v1
- Date: Wed, 14 May 2025 10:23:22 GMT
- Title: EDBench: Large-Scale Electron Density Data for Molecular Modeling
- Authors: Hongxin Xiang, Ke Li, Mingquan Liu, Zhixiang Cheng, Bin Yao, Wenjie Du, Jun Xia, Li Zeng, Xin Jin, Xiangxiang Zeng,
- Abstract summary: electron density (ED) $rho(r)$ determines all ground state properties of interactive multi-particle systems.<n>EDBench is a large-scale, high-quality dataset of ED designed to advance learning-based research at the electronic scale.
- Score: 19.93035885065626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing molecular machine learning force fields (MLFFs) generally focus on the learning of atoms, molecules, and simple quantum chemical properties (such as energy and force), but ignore the importance of electron density (ED) $\rho(r)$ in accurately understanding molecular force fields (MFFs). ED describes the probability of finding electrons at specific locations around atoms or molecules, which uniquely determines all ground state properties (such as energy, molecular structure, etc.) of interactive multi-particle systems according to the Hohenberg-Kohn theorem. However, the calculation of ED relies on the time-consuming first-principles density functional theory (DFT) which leads to the lack of large-scale ED data and limits its application in MLFFs. In this paper, we introduce EDBench, a large-scale, high-quality dataset of ED designed to advance learning-based research at the electronic scale. Built upon the PCQM4Mv2, EDBench provides accurate ED data, covering 3.3 million molecules. To comprehensively evaluate the ability of models to understand and utilize electronic information, we design a suite of ED-centric benchmark tasks spanning prediction, retrieval, and generation. Our evaluation on several state-of-the-art methods demonstrates that learning from EDBench is not only feasible but also achieves high accuracy. Moreover, we show that learning-based method can efficiently calculate ED with comparable precision while significantly reducing the computational cost relative to traditional DFT calculations. All data and benchmarks from EDBench will be freely available, laying a robust foundation for ED-driven drug discovery and materials science.
Related papers
- Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials [34.82692226532414]
Machine learning interatomic potentials (MLIPs) are a promising tool to accelerate atomistic simulations and molecular property prediction.<n>The quality of MLIPs depends on the quantity of available training data as well as the quantum chemistry (QC) level of theory used to generate that data.<n>We present an ensemble knowledge distillation (EKD) method to improve MLIP accuracy when trained to energy-only datasets.
arXiv Detail & Related papers (2025-03-18T14:32:51Z) - Materials Learning Algorithms (MALA): Scalable Machine Learning for Electronic Structure Calculations in Large-Scale Atomistic Simulations [2.04071520659173]
We present the Materials Learning Algorithms (MALA) package, a scalable machine learning framework suitable for large-scale atomistic simulations.<n>MALA models efficiently predict key electronic observables, including local density of states, electronic density, density of states, and total energy.<n>We demonstrate MALA's capabilities with examples including boron clusters, aluminum across its solid-liquid phase boundary, and predicting the electronic structure of a stacking fault in a large beryllium slab.
arXiv Detail & Related papers (2024-11-29T11:10:29Z) - Predicting ionic conductivity in solids from the machine-learned potential energy landscape [68.25662704255433]
We propose an approach for the quick and reliable screening of ionic conductors through the analysis of a universal interatomic potential.<n>Eight out of the ten highest-ranked materials are confirmed to be superionic at room temperature in first-principles calculations.<n>Our method achieves a speed-up factor of approximately 50 compared to molecular dynamics driven by a machine-learning potential, and is at least 3,000 times faster compared to first-principles molecular dynamics.
arXiv Detail & Related papers (2024-11-11T09:01:36Z) - E3STO: Orbital Inspired SE(3)-Equivariant Molecular Representation for Electron Density Prediction [0.0]
We introduce a novel SE(3)-equivariant architecture, drawing inspiration from Slater-Type Orbitals (STO)
Our approach offers an alternative functional form for learned orbital-like molecular representation.
We showcase the effectiveness of our method by achieving SOTA prediction accuracy of molecular electron density with 30-70% improvement over other work on Molecular Dynamics data.
arXiv Detail & Related papers (2024-10-08T15:20:33Z) - Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy [9.81014501502049]
We develop a unified machine learning method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data.
Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties.
arXiv Detail & Related papers (2024-05-09T19:51:27Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Molecular Geometry-aware Transformer for accurate 3D Atomic System
modeling [51.83761266429285]
We propose a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them.
Moleformer achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties.
arXiv Detail & Related papers (2023-02-02T03:49:57Z) - Electronic-structure properties from atom-centered predictions of the
electron density [0.0]
electron density of a molecule or material has recently received major attention as a target quantity of machine-learning models.
We propose a gradient-based approach to minimize the loss function of the regression problem in an optimized and highly sparse feature space.
We show that starting from the predicted density a single Kohn-Sham diagonalization step can be performed to access total energy components that carry an error of just 0.1 meV/atom.
arXiv Detail & Related papers (2022-06-28T15:35:55Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Computing molecular excited states on a D-Wave quantum annealer [52.5289706853773]
We demonstrate the use of a D-Wave quantum annealer for the calculation of excited electronic states of molecular systems.
These simulations play an important role in a number of areas, such as photovoltaics, semiconductor technology and nanoscience.
arXiv Detail & Related papers (2021-07-01T01:02:17Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.