Assessing Graph-based Deep Learning Models for Predicting Flash Point
- URL: http://arxiv.org/abs/2002.11315v1
- Date: Wed, 26 Feb 2020 06:10:12 GMT
- Title: Assessing Graph-based Deep Learning Models for Predicting Flash Point
- Authors: Xiaoyu Sun, Nathaniel J. Krakauer, Alexander Politowicz, Wei-Ting
Chen, Qiying Li, Zuoyi Li, Xianjia Shao, Alfred Sunaryo, Mingren Shen, James
Wang, Dane Morgan
- Abstract summary: Graph-based deep learning (GBDL) models were implemented in predicting flash point for the first time.
Average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies.
- Score: 52.931492216239995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Flash points of organic molecules play an important role in preventing
flammability hazards and large databases of measured values exist, although
millions of compounds remain unmeasured. To rapidly extend existing data to new
compounds many researchers have used quantitative structure-property
relationship (QSPR) analysis to effectively predict flash points. In recent
years graph-based deep learning (GBDL) has emerged as a powerful alternative
method to traditional QSPR. In this paper, GBDL models were implemented in
predicting flash point for the first time. We assessed the performance of two
GBDL models, message-passing neural network (MPNN) and graph convolutional
neural network (GCNN), by comparing methods. Our result shows that MPNN both
outperforms GCNN and yields slightly worse but comparable performance with
previous QSPR studies. The average R2 and Mean Absolute Error (MAE) scores of
MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable
studies. To further explore GBDL models, we collected the largest flash point
dataset to date, which contains 10575 unique molecules. The optimized MPNN
gives a test data R2 of 0.803 and MAE of 17.8 K on the complete dataset. We
also extracted 5 datasets from our integrated dataset based on molecular types
(acids, organometallics, organogermaniums, organosilicons, and organotins) and
explore the quality of the model in these classes.against 12 previous QSPR
studies using more traditional
Related papers
- Deep Unlearn: Benchmarking Machine Unlearning [7.450700594277741]
Machine unlearning (MU) aims to remove the influence of particular data points from the learnable parameters of a trained machine learning model.
This paper investigates 18 state-of-the-art MU methods across various benchmark datasets and models.
arXiv Detail & Related papers (2024-10-02T06:41:58Z) - Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons [2.77390041716769]
Kolmogorov-Arnold Networks (KANs) use highly flexible learnable activation functions directly on network edges.
KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments.
We show that individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters.
arXiv Detail & Related papers (2024-09-16T16:56:08Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Neural scaling laws for phenotypic drug discovery [3.076170146656896]
We investigate if scale can have a similar impact for models designed to aid small molecule drug discovery.
We find that DNNs explicitly supervised to solve tasks in the Pheno-CA do not continuously improve as their data and model size is scaled-up.
We introduce a novel precursor task, the Inverse Biological Process (IBP), which is designed to resemble the causal objective functions that have proven successful for NLP.
arXiv Detail & Related papers (2023-09-28T18:10:43Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Scalable training of graph convolutional neural networks for fast and
accurate predictions of HOMO-LUMO gap in molecules [1.8947048356389908]
This work focuses on building GCNN models on HPC systems to predict material properties of millions of molecules.
We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch.
We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap.
arXiv Detail & Related papers (2022-07-22T20:54:22Z) - A Systematic Comparison Study on Hyperparameter Optimisation of Graph
Neural Networks for Molecular Property Prediction [8.02401104726362]
Graph neural networks (GNNs) have been proposed for a wide range of graph-related learning tasks.
In recent years there has been an increasing number of GNN systems that were applied to predict molecular properties.
arXiv Detail & Related papers (2021-02-08T15:40:50Z) - Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug
Response [49.86828302591469]
In this paper, we apply transfer learning to the prediction of anti-cancer drug response.
We apply the classic transfer learning framework that trains a prediction model on the source dataset and refines it on the target dataset.
The ensemble transfer learning pipeline is implemented using LightGBM and two deep neural network (DNN) models with different architectures.
arXiv Detail & Related papers (2020-05-13T20:29:48Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.