Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning
- URL: http://arxiv.org/abs/2410.10118v1
- Date: Mon, 14 Oct 2024 03:11:33 GMT
- Title: Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning
- Authors: Yuxuan Ren, Dihan Zheng, Chang Liu, Peiran Jin, Yu Shi, Lin Huang, Jiyan He, Shengjie Luo, Tao Qin, Tie-Yan Liu,
- Abstract summary: We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches.
We demonstrate that the more accurate energy data can improve the accuracy of structure prediction.
We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
- Score: 79.75718786477638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, machine learning has demonstrated impressive capability in handling molecular science tasks. To support various molecular properties at scale, machine learning models are trained in the multi-task learning paradigm. Nevertheless, data of different molecular properties are often not aligned: some quantities, e.g. equilibrium structure, demand more cost to compute than others, e.g. energy, so their data are often generated by cheaper computational methods at the cost of lower accuracy, which cannot be directly overcome through multi-task learning. Moreover, it is not straightforward to leverage abundant data of other tasks to benefit a particular task. To handle such data heterogeneity challenges, we exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches that allow different tasks to exchange information directly so as to improve one another. Particularly, we demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction, demonstrating a broad capability for integrating heterogeneous data.
Related papers
- Scalable Multi-Task Transfer Learning for Molecular Property Prediction [10.512534299496725]
The proposed method enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios.
Empirically, the proposed method improved the prediction performance of 40 molecular properties and accelerated training convergence.
arXiv Detail & Related papers (2024-10-01T06:28:14Z) - Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials [34.8008617873679]
We find that multi-task neural networks can learn from multi-modal data and outperform single-task models trained for specific properties.
As expected, the improvement is more significant for data-scarce properties.
This approach is widely applicable to fields outside energetic materials.
arXiv Detail & Related papers (2024-08-21T12:54:26Z) - Multitask methods for predicting molecular properties from heterogeneous data [0.27309692684728615]
We demonstrate that multitask Gaussian process regression overcomes the limitation by leveraging both expensive and cheap data sources.
We report that multitask surrogates can predict at CC-level accuracy with a reduction to data generation cost by over an order of magnitude.
multitask regression can be a tool for reducing data generation costs even further by opportunistically exploiting existing data sources.
arXiv Detail & Related papers (2024-01-31T15:04:03Z) - On the Interplay of Subset Selection and Informed Graph Neural Networks [3.091456764812509]
This work focuses on predicting the molecules atomization energy in the QM9 dataset.
We show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques.
We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer.
arXiv Detail & Related papers (2023-06-15T09:09:27Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.