Related papers: Polymer Informatics with Multi-Task Learning

Polymer Informatics with Multi-Task Learning

URL: http://arxiv.org/abs/2010.15166v1
Date: Wed, 28 Oct 2020 18:28:12 GMT
Title: Polymer Informatics with Multi-Task Learning
Authors: Christopher K\"unneth, Arunkumar Chitteth Rajan, Huan Tran, Lihua Chen, Chiho Kim, Rampi Ramprasad
Abstract summary: We show the potency of multi-task learning approaches that exploit inherent correlations effectively. Data pertaining to 36 different properties of over $13, 000$ polymers are coalesced and supplied to deep-learning multi-task architectures. The multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available.
Score: 0.06524460254566902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern data-driven tools are transforming application-specific polymer development cycles. Surrogate models that can be trained to predict the properties of new polymers are becoming commonplace. Nevertheless, these models do not utilize the full breadth of the knowledge available in datasets, which are oftentimes sparse; inherent correlations between different property datasets are disregarded. Here, we demonstrate the potency of multi-task learning approaches that exploit such inherent correlations effectively, particularly when some property dataset sizes are small. Data pertaining to 36 different properties of over $13, 000$ polymers (corresponding to over $23,000$ data points) are coalesced and supplied to deep-learning multi-task architectures. Compared to conventional single-task learning models (that are trained on individual property datasets independently), the multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available. Moreover, these models are interpretable. Chemical rules, that explain how certain features control trends in specific property values, emerge from the present work, paving the way for the rational design of application specific polymers meeting desired property or performance objectives.

Related papers

Data Fusion of Deep Learned Molecular Embeddings for Property Prediction [44.99833362998488]
We use data fusion techniques to combine the learned molecular embeddings of various single-task models and trained a multi-task model on this combined embedding. We show that the fused, multi-task models outperform standard multi-task models for sparse datasets and can provide enhanced prediction on data-limited properties compared to single-task models.
arXiv Detail & Related papers (2025-04-09T21:40:15Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales. We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches. We demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z)
Extrapolative ML Models for Copolymers [1.901715290314837]
Machine learning models have been progressively used for predicting materials properties. These models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved. Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach.
arXiv Detail & Related papers (2024-09-15T11:02:01Z)
Anno-incomplete Multi-dataset Detection [67.69438032767613]
We propose a novel problem as "-incomplete Multi-dataset Detection" We develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets.
arXiv Detail & Related papers (2024-08-29T03:58:21Z)
Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials [34.8008617873679]
We find that multi-task neural networks can learn from multi-modal data and outperform single-task models trained for specific properties. As expected, the improvement is more significant for data-scarce properties. This approach is widely applicable to fields outside energetic materials.
arXiv Detail & Related papers (2024-08-21T12:54:26Z)
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets [42.401713168958445]
We present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library.
arXiv Detail & Related papers (2023-10-06T14:51:17Z)
infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization. infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z)
When does deep learning fail and how to tackle it? A critical analysis on polymer sequence-property surrogate models [1.0152838128195467]
Deep learning models are gaining popularity and potency in predicting polymer properties. These models can be built using pre-existing data and are useful for the rapid prediction of polymer properties. However, the performance of a deep learning model is intricately connected to its topology and the volume of training data.
arXiv Detail & Related papers (2022-10-12T23:04:10Z)
Multi-objective Deep Data Generation with Correlated Property Control [23.99970130388449]
We propose a novel deep generative framework that recovers semantics and the correlation of properties through disentangled latent vectors. Our generative model preserves properties of interest while handling correlation and conflicts of properties under a multi-objective optimization framework.
arXiv Detail & Related papers (2022-10-01T00:35:45Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
Model-agnostic multi-objective approach for the evolutionary discovery of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.