Polymer Informatics with Multi-Task Learning
- URL: http://arxiv.org/abs/2010.15166v1
- Date: Wed, 28 Oct 2020 18:28:12 GMT
- Title: Polymer Informatics with Multi-Task Learning
- Authors: Christopher K\"unneth, Arunkumar Chitteth Rajan, Huan Tran, Lihua
Chen, Chiho Kim, Rampi Ramprasad
- Abstract summary: We show the potency of multi-task learning approaches that exploit inherent correlations effectively.
Data pertaining to 36 different properties of over $13, 000$ polymers are coalesced and supplied to deep-learning multi-task architectures.
The multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available.
- Score: 0.06524460254566902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern data-driven tools are transforming application-specific polymer
development cycles. Surrogate models that can be trained to predict the
properties of new polymers are becoming commonplace. Nevertheless, these models
do not utilize the full breadth of the knowledge available in datasets, which
are oftentimes sparse; inherent correlations between different property
datasets are disregarded. Here, we demonstrate the potency of multi-task
learning approaches that exploit such inherent correlations effectively,
particularly when some property dataset sizes are small. Data pertaining to 36
different properties of over $13, 000$ polymers (corresponding to over $23,000$
data points) are coalesced and supplied to deep-learning multi-task
architectures. Compared to conventional single-task learning models (that are
trained on individual property datasets independently), the multi-task approach
is accurate, efficient, scalable, and amenable to transfer learning as more
data on the same or different properties become available. Moreover, these
models are interpretable. Chemical rules, that explain how certain features
control trends in specific property values, emerge from the present work,
paving the way for the rational design of application specific polymers meeting
desired property or performance objectives.
Related papers
- Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches.
We demonstrate that the more accurate energy data can improve the accuracy of structure prediction.
We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z) - Extrapolative ML Models for Copolymers [1.901715290314837]
Machine learning models have been progressively used for predicting materials properties.
These models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved.
Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach.
arXiv Detail & Related papers (2024-09-15T11:02:01Z) - Anno-incomplete Multi-dataset Detection [67.69438032767613]
We propose a novel problem as "-incomplete Multi-dataset Detection"
We develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets.
arXiv Detail & Related papers (2024-08-29T03:58:21Z) - Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials [34.8008617873679]
We find that multi-task neural networks can learn from multi-modal data and outperform single-task models trained for specific properties.
As expected, the improvement is more significant for data-scarce properties.
This approach is widely applicable to fields outside energetic materials.
arXiv Detail & Related papers (2024-08-21T12:54:26Z) - Towards Foundational Models for Molecular Learning on Large-Scale
Multi-Task Datasets [42.401713168958445]
We present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge.
These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning.
In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library.
arXiv Detail & Related papers (2023-10-06T14:51:17Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - When does deep learning fail and how to tackle it? A critical analysis
on polymer sequence-property surrogate models [1.0152838128195467]
Deep learning models are gaining popularity and potency in predicting polymer properties.
These models can be built using pre-existing data and are useful for the rapid prediction of polymer properties.
However, the performance of a deep learning model is intricately connected to its topology and the volume of training data.
arXiv Detail & Related papers (2022-10-12T23:04:10Z) - Multi-objective Deep Data Generation with Correlated Property Control [23.99970130388449]
We propose a novel deep generative framework that recovers semantics and the correlation of properties through disentangled latent vectors.
Our generative model preserves properties of interest while handling correlation and conflicts of properties under a multi-objective optimization framework.
arXiv Detail & Related papers (2022-10-01T00:35:45Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.