Related papers: CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules

CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules

URL: http://arxiv.org/abs/2406.14149v2
Date: Tue, 23 Jul 2024 10:34:19 GMT
Title: CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules
Authors: Vivin Vinod, Peter Zaspel,
Abstract summary: We provide the quantum Chemistry MultiFidelity (CheMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Progress in both Machine Learning (ML) and Quantum Chemistry (QC) methods have resulted in high accuracy ML models for QC properties. Datasets such as MD17 and WS22 have been used to benchmark these models at some level of QC method, or fidelity, which refers to the accuracy of the chosen QC method. Multifidelity ML (MFML) methods, where models are trained on data from more than one fidelity, have shown to be effective over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. One hurdle for effective research here is the lack of a diverse multifidelity dataset for benchmarking. We provide the quantum Chemistry MultiFidelity (CheMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. CheMFi offers to the community a variety of QC properties such as vertical excitation properties and molecular dipole moments, further including QC computation times allowing for a time benefit benchmark of multifidelity models for ML-QC.

Related papers

Ensemble Knowledge Distillation for Machine Learning Interatomic Potentials [34.82692226532414]
Machine learning interatomic potentials (MLIPs) are a promising tool to accelerate atomistic simulations and molecular property prediction. The quality of MLIPs depends on the quantity of available training data as well as the quantum chemistry (QC) level of theory used to generate that data. We present an ensemble knowledge distillation (EKD) method to improve MLIP accuracy when trained to energy-only datasets.
arXiv Detail & Related papers (2025-03-18T14:32:51Z)
GWQ: Gradient-Aware Weight Quantization for Large Language Models [61.17678373122165]
gradient-aware weight quantization (GWQ) is the first quantization approach for low-bit weight quantization that leverages gradients to localize outliers. GWQ retains the corresponding to the top 1% outliers preferentially at FP16 precision, while the remaining non-outlier weights are stored in a low-bit format. In the zero-shot task, GWQ quantized models have higher accuracy compared to other quantization methods.
arXiv Detail & Related papers (2024-10-30T11:16:04Z)
Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies [0.0]
This study investigates the impact of modifying $gamma$ on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark dataset. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions from each fidelity. The results indicate that high model accuracy can be achieved with just 2 training samples at the target fidelity when a larger number of samples from lower fidelities are used.
arXiv Detail & Related papers (2024-10-15T08:35:00Z)
Benchmarking Data Efficiency in $Δ$-ML and Multifidelity Models for Quantum Chemistry [0.0]
This work compares the data costs associated with $Delta$-ML, multifidelity machine learning (MFML), and optimized MFML (o-MFML) The results indicate that the use of multifidelity methods surpasses the standard $Delta$-ML approaches in cases of a large number of predictions.
arXiv Detail & Related papers (2024-10-15T08:34:32Z)
Quantum Kernel Methods under Scrutiny: A Benchmarking Study [0.0]
Two common approaches for computing the underlying Gram matrix have emerged: fidelity quantum kernels (FQKs) and projected quantum kernels (PQKs) We present a comprehensive large-scale study examining QKMs based on FQKs and PQKs across a manifold of design choices. Our goal is not to identify the best-performing model for a specific task but to uncover the mechanisms that lead to effective QKMs.
arXiv Detail & Related papers (2024-09-06T16:56:06Z)
Assessing Non-Nested Configurations of Multifidelity Machine Learning for Quantum-Chemical Properties [0.0]
Multifidelity machine learning (MFML) for quantum chemical (QC) properties has seen strong development in the recent years. This work assesses the use of non-nested training data for two of these multifidelity methods, namely MFML and optimized MFML.
arXiv Detail & Related papers (2024-07-24T08:34:08Z)
Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy [9.81014501502049]
We develop a unified machine learning method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties.
arXiv Detail & Related papers (2024-05-09T19:51:27Z)
Federated Quantum Long Short-term Memory (FedQLSTM) [58.50321380769256]
Quantum federated learning (QFL) can facilitate collaborative learning across multiple clients using quantum machine learning (QML) models. No prior work has focused on developing a QFL framework that utilizes temporal data to approximate functions. A novel QFL framework that is the first to integrate quantum long short-term memory (QLSTM) models with temporal data is proposed.
arXiv Detail & Related papers (2023-12-21T21:40:47Z)
QKSAN: A Quantum Kernel Self-Attention Network [53.96779043113156]
A Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM. A Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques. Four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST.
arXiv Detail & Related papers (2023-08-25T15:08:19Z)
QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories. We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z)
An Empirical Comparison of LM-based Question and Answer Generation Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context. In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning. Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z)
Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs. We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling. We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z)
Study of Feature Importance for Quantum Machine Learning Models [0.0]
Predictor importance is a crucial part of data preprocessing pipelines in classical and quantum machine learning (QML) This work presents the first study of its kind in which feature importance for QML models has been explored and contrasted against their classical machine learning (CML) equivalents. We developed a hybrid quantum-classical architecture where QML models are trained and feature importance values are calculated from classical algorithms on a real-world dataset.
arXiv Detail & Related papers (2022-02-18T15:21:47Z)
When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing [75.75419308975746]
This work proposes a vertical federated learning architecture based on variational quantum circuits to demonstrate the competitive performance of a quantum-enhanced pre-trained BERT model for text classification. Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.
arXiv Detail & Related papers (2022-02-17T09:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.