TransPolymer: a Transformer-based language model for polymer property
predictions
- URL: http://arxiv.org/abs/2209.01307v4
- Date: Wed, 26 Apr 2023 01:52:00 GMT
- Title: TransPolymer: a Transformer-based language model for polymer property
predictions
- Authors: Changwen Xu, Yuyang Wang, Amir Barati Farimani
- Abstract summary: TransPolymer is a Transformer-based language model for polymer property prediction.
Our proposed polymer tokenizer with chemical awareness enables learning representations from polymer sequences.
- Score: 9.04563945965023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and efficient prediction of polymer properties is of great
significance in polymer design. Conventionally, expensive and time-consuming
experiments or simulations are required to evaluate polymer functions.
Recently, Transformer models, equipped with self-attention mechanisms, have
exhibited superior performance in natural language processing. However, such
methods have not been investigated in polymer sciences. Herein, we report
TransPolymer, a Transformer-based language model for polymer property
prediction. Our proposed polymer tokenizer with chemical awareness enables
learning representations from polymer sequences. Rigorous experiments on ten
polymer property prediction benchmarks demonstrate the superior performance of
TransPolymer. Moreover, we show that TransPolymer benefits from pretraining on
large unlabeled dataset via Masked Language Modeling. Experimental results
further manifest the important role of self-attention in modeling polymer
sequences. We highlight this model as a promising computational tool for
promoting rational polymer design and understanding structure-property
relationships from a data science view.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - Molecular topological deep learning for polymer property prediction [18.602659324026934]
We develop molecular topological deep learning (Mol-TDL) for polymer property analysis.
Mol-TDL incorporates both high-order interactions and multiscale properties into topological deep learning architecture.
arXiv Detail & Related papers (2024-10-07T05:44:02Z) - MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction [24.975491375575224]
MMPolymer is a novel multitask pretraining framework incorporating polymer 1D sequential and 3D structural information.
MMPolymer achieves state-of-the-art performance in downstream property prediction tasks.
arXiv Detail & Related papers (2024-06-07T08:19:59Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Transferring a molecular foundation model for polymer property
predictions [3.067983186439152]
Self-supervised pretraining of transformer models requires large-scale datasets.
We show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets.
arXiv Detail & Related papers (2023-10-25T19:55:00Z) - Representing Polymers as Periodic Graphs with Learned Descriptors for
Accurate Polymer Property Predictions [16.468017785818198]
We develop a periodic polymer graph representation that consistently outperforms hand-designed representations.
We also demonstrate how combining polymer graph representations with message-passing neural network architectures can automatically extract meaningful polymer features.
arXiv Detail & Related papers (2022-05-27T04:14:12Z) - Copolymer Informatics with Multi-Task Deep Neural Networks [0.0]
We address the property prediction challenge for copolymers, extending the polymer informatics framework beyond homopolymers.
A large data set containing over 18,000 data points of glass transition, melting, and degradation temperature of homopolymers and copolymers of up to two monomers is used.
The developed models are accurate, fast, flexible, and scalable to more copolymer properties when suitable data become available.
arXiv Detail & Related papers (2021-03-25T23:28:20Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Polymers for Extreme Conditions Designed Using Syntax-Directed
Variational Autoencoders [53.34780987686359]
Machine learning tools are now commonly employed to virtually screen material candidates with desired properties.
This approach is inefficient, and severely constrained by the candidates that human imagination can conceive.
We utilize syntax-directed variational autoencoders (VAE) in tandem with Gaussian process regression (GPR) models to discover polymers expected to be robust under three extreme conditions.
arXiv Detail & Related papers (2020-11-04T21:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.