TransPolymer: a Transformer-based language model for polymer property
predictions
- URL: http://arxiv.org/abs/2209.01307v4
- Date: Wed, 26 Apr 2023 01:52:00 GMT
- Title: TransPolymer: a Transformer-based language model for polymer property
predictions
- Authors: Changwen Xu, Yuyang Wang, Amir Barati Farimani
- Abstract summary: TransPolymer is a Transformer-based language model for polymer property prediction.
Our proposed polymer tokenizer with chemical awareness enables learning representations from polymer sequences.
- Score: 9.04563945965023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and efficient prediction of polymer properties is of great
significance in polymer design. Conventionally, expensive and time-consuming
experiments or simulations are required to evaluate polymer functions.
Recently, Transformer models, equipped with self-attention mechanisms, have
exhibited superior performance in natural language processing. However, such
methods have not been investigated in polymer sciences. Herein, we report
TransPolymer, a Transformer-based language model for polymer property
prediction. Our proposed polymer tokenizer with chemical awareness enables
learning representations from polymer sequences. Rigorous experiments on ten
polymer property prediction benchmarks demonstrate the superior performance of
TransPolymer. Moreover, we show that TransPolymer benefits from pretraining on
large unlabeled dataset via Masked Language Modeling. Experimental results
further manifest the important role of self-attention in modeling polymer
sequences. We highlight this model as a promising computational tool for
promoting rational polymer design and understanding structure-property
relationships from a data science view.
Related papers
- Predicting Polymer Properties Based on Multimodal Multitask Pretraining [24.975491375575224]
MMPolymer is a novel multitask pretraining framework incorporating both polymer 1D sequential information and 3D structural information.
MMPolymer achieves state-of-the-art performance in various polymer property prediction tasks.
arXiv Detail & Related papers (2024-06-07T08:19:59Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [50.756644656847165]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Transferring a molecular foundation model for polymer property
predictions [3.067983186439152]
Self-supervised pretraining of transformer models requires large-scale datasets.
We show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets.
arXiv Detail & Related papers (2023-10-25T19:55:00Z) - MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text.
MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z) - Representing Polymers as Periodic Graphs with Learned Descriptors for
Accurate Polymer Property Predictions [16.468017785818198]
We develop a periodic polymer graph representation that consistently outperforms hand-designed representations.
We also demonstrate how combining polymer graph representations with message-passing neural network architectures can automatically extract meaningful polymer features.
arXiv Detail & Related papers (2022-05-27T04:14:12Z) - A graph representation of molecular ensembles for polymer property
prediction [3.032184156362992]
In contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules.
We introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction.
arXiv Detail & Related papers (2022-05-17T20:31:43Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z) - Prediction of liquid fuel properties using machine learning models with
Gaussian processes and probabilistic conditional generative learning [56.67751936864119]
The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels.
Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach.
The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
arXiv Detail & Related papers (2021-10-18T14:43:50Z) - Copolymer Informatics with Multi-Task Deep Neural Networks [0.0]
We address the property prediction challenge for copolymers, extending the polymer informatics framework beyond homopolymers.
A large data set containing over 18,000 data points of glass transition, melting, and degradation temperature of homopolymers and copolymers of up to two monomers is used.
The developed models are accurate, fast, flexible, and scalable to more copolymer properties when suitable data become available.
arXiv Detail & Related papers (2021-03-25T23:28:20Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Polymers for Extreme Conditions Designed Using Syntax-Directed
Variational Autoencoders [53.34780987686359]
Machine learning tools are now commonly employed to virtually screen material candidates with desired properties.
This approach is inefficient, and severely constrained by the candidates that human imagination can conceive.
We utilize syntax-directed variational autoencoders (VAE) in tandem with Gaussian process regression (GPR) models to discover polymers expected to be robust under three extreme conditions.
arXiv Detail & Related papers (2020-11-04T21:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.