Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction
- URL: http://arxiv.org/abs/2403.09560v2
- Date: Wed, 5 Jun 2024 07:46:36 GMT
- Title: Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction
- Authors: He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu,
- Abstract summary: We show that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training.
It enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge.
It is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries.
- Score: 74.84850523400873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.
Related papers
- FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning [14.477559543490242]
Federated Learning (FL) is an innovative distributed machine learning paradigm that enables neural network training across devices without centralizing data.
Previous research shows that in IID environments, the parameter structure of the model is expected to adhere to certain specific consistency principles.
This paper identifies the consistency between the two and leverages it to regulate training, underpinning our proposed FedDW framework.
Experimental results show FedDW outperforms 10 state-of-the-art FL methods, improving accuracy by an average of 3% in highly heterogeneous settings.
arXiv Detail & Related papers (2024-12-05T12:32:40Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Physics-Informed Weakly Supervised Learning for Interatomic Potentials [17.165117198519248]
We introduce a physics-informed, weakly supervised approach for training machine-learned interatomic potentials.
We demonstrate reduced energy and force errors -- often lower by a factor of two -- for various baseline models and benchmark data sets.
arXiv Detail & Related papers (2024-07-23T12:49:04Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Machine Learning Force Fields with Data Cost Aware Training [94.78998399180519]
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation.
Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels.
We propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
arXiv Detail & Related papers (2023-06-05T04:34:54Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Deep Active Learning for Biased Datasets via Fisher Kernel
Self-Supervision [5.352699766206807]
Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs)
We propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK)
Our method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing.
arXiv Detail & Related papers (2020-03-01T03:56:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.