Polymer Informatics with Multi-Task Learning
- URL: http://arxiv.org/abs/2010.15166v1
- Date: Wed, 28 Oct 2020 18:28:12 GMT
- Title: Polymer Informatics with Multi-Task Learning
- Authors: Christopher K\"unneth, Arunkumar Chitteth Rajan, Huan Tran, Lihua
Chen, Chiho Kim, Rampi Ramprasad
- Abstract summary: We show the potency of multi-task learning approaches that exploit inherent correlations effectively.
Data pertaining to 36 different properties of over $13, 000$ polymers are coalesced and supplied to deep-learning multi-task architectures.
The multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available.
- Score: 0.06524460254566902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern data-driven tools are transforming application-specific polymer
development cycles. Surrogate models that can be trained to predict the
properties of new polymers are becoming commonplace. Nevertheless, these models
do not utilize the full breadth of the knowledge available in datasets, which
are oftentimes sparse; inherent correlations between different property
datasets are disregarded. Here, we demonstrate the potency of multi-task
learning approaches that exploit such inherent correlations effectively,
particularly when some property dataset sizes are small. Data pertaining to 36
different properties of over $13, 000$ polymers (corresponding to over $23,000$
data points) are coalesced and supplied to deep-learning multi-task
architectures. Compared to conventional single-task learning models (that are
trained on individual property datasets independently), the multi-task approach
is accurate, efficient, scalable, and amenable to transfer learning as more
data on the same or different properties become available. Moreover, these
models are interpretable. Chemical rules, that explain how certain features
control trends in specific property values, emerge from the present work,
paving the way for the rational design of application specific polymers meeting
desired property or performance objectives.
Related papers
- Towards Foundational Models for Molecular Learning on Large-Scale
Multi-Task Datasets [42.401713168958445]
We present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge.
These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning.
In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library.
arXiv Detail & Related papers (2023-10-06T14:51:17Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - When does deep learning fail and how to tackle it? A critical analysis
on polymer sequence-property surrogate models [1.0152838128195467]
Deep learning models are gaining popularity and potency in predicting polymer properties.
These models can be built using pre-existing data and are useful for the rapid prediction of polymer properties.
However, the performance of a deep learning model is intricately connected to its topology and the volume of training data.
arXiv Detail & Related papers (2022-10-12T23:04:10Z) - Combining datasets to increase the number of samples and improve model
fitting [7.4771091238795595]
We propose a novel framework called Combine datasets based on Imputation (ComImp)
In addition, we propose a variant of ComImp that uses Principle Component Analysis (PCA), PCA-ComImp in order to reduce dimension before combining datasets.
Our results indicate that the proposed methods are somewhat similar to transfer learning in that the merge can significantly improve the accuracy of a prediction model on smaller datasets.
arXiv Detail & Related papers (2022-10-11T06:06:37Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Multi-objective Deep Data Generation with Correlated Property Control [23.99970130388449]
We propose a novel deep generative framework that recovers semantics and the correlation of properties through disentangled latent vectors.
Our generative model preserves properties of interest while handling correlation and conflicts of properties under a multi-objective optimization framework.
arXiv Detail & Related papers (2022-10-01T00:35:45Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Improving VAE based molecular representations for compound property
prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.