Related papers: TransPolymer: a Transformer-based language model for polymer property predictions

TransPolymer: a Transformer-based language model for polymer property predictions

URL: http://arxiv.org/abs/2209.01307v4
Date: Wed, 26 Apr 2023 01:52:00 GMT
Title: TransPolymer: a Transformer-based language model for polymer property predictions
Authors: Changwen Xu, Yuyang Wang, Amir Barati Farimani
Abstract summary: TransPolymer is a Transformer-based language model for polymer property prediction. Our proposed polymer tokenizer with chemical awareness enables learning representations from polymer sequences.
Score: 9.04563945965023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate and efficient prediction of polymer properties is of great significance in polymer design. Conventionally, expensive and time-consuming experiments or simulations are required to evaluate polymer functions. Recently, Transformer models, equipped with self-attention mechanisms, have exhibited superior performance in natural language processing. However, such methods have not been investigated in polymer sciences. Herein, we report TransPolymer, a Transformer-based language model for polymer property prediction. Our proposed polymer tokenizer with chemical awareness enables learning representations from polymer sequences. Rigorous experiments on ten polymer property prediction benchmarks demonstrate the superior performance of TransPolymer. Moreover, we show that TransPolymer benefits from pretraining on large unlabeled dataset via Masked Language Modeling. Experimental results further manifest the important role of self-attention in modeling polymer sequences. We highlight this model as a promising computational tool for promoting rational polymer design and understanding structure-property relationships from a data science view.

Related papers

POINT$^{2}$: A Polymer Informatics Training and Testing Database [15.45788515943579]
POINT$2$ (POlymer INformatics Training and Testing) is a benchmark database and protocol designed to address critical challenges in polymer informatics. We develop an ensemble of ML models, including Quantile Random Forests, Multilayer Perceptrons with dropout, Graph Neural Networks, and pretrained large language models. These models are coupled with diverse polymer representations such as Morgan, MACCS, RDKit, Topological, Atom Pair fingerprints, and graph-based descriptors.
arXiv Detail & Related papers (2025-03-30T15:46:01Z)
Multimodal machine learning with large language embedding model for polymer property prediction [2.525624865489335]
We propose a simple yet effective multimodal architecture, PolyLLMem, for polymer properties prediction tasks. PolyLLMem integrates text embeddings generated by Llama 3 with molecular structure embeddings derived from Uni-Mol. Its performance is comparable to, and in some cases exceeds, that of graph-based models, as well as transformer-based models.
arXiv Detail & Related papers (2025-03-29T03:48:11Z)
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science [62.96434290874878]
Current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. We develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator.
arXiv Detail & Related papers (2025-01-18T13:54:00Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM) TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions. Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z)
Molecular topological deep learning for polymer property prediction [18.602659324026934]
We develop molecular topological deep learning (Mol-TDL) for polymer property analysis. Mol-TDL incorporates both high-order interactions and multiscale properties into topological deep learning architecture.
arXiv Detail & Related papers (2024-10-07T05:44:02Z)
MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction [24.975491375575224]
MMPolymer is a novel multitask pretraining framework incorporating polymer 1D sequential and 3D structural information. MMPolymer achieves state-of-the-art performance in downstream property prediction tasks.
arXiv Detail & Related papers (2024-06-07T08:19:59Z)
Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers' We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z)
Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task. We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z)
Transferring a molecular foundation model for polymer property predictions [3.067983186439152]
Self-supervised pretraining of transformer models requires large-scale datasets. We show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets.
arXiv Detail & Related papers (2023-10-25T19:55:00Z)
Representing Polymers as Periodic Graphs with Learned Descriptors for Accurate Polymer Property Predictions [16.468017785818198]
We develop a periodic polymer graph representation that consistently outperforms hand-designed representations. We also demonstrate how combining polymer graph representations with message-passing neural network architectures can automatically extract meaningful polymer features.
arXiv Detail & Related papers (2022-05-27T04:14:12Z)
Copolymer Informatics with Multi-Task Deep Neural Networks [0.0]
We address the property prediction challenge for copolymers, extending the polymer informatics framework beyond homopolymers. A large data set containing over 18,000 data points of glass transition, melting, and degradation temperature of homopolymers and copolymers of up to two monomers is used. The developed models are accurate, fast, flexible, and scalable to more copolymer properties when suitable data become available.
arXiv Detail & Related papers (2021-03-25T23:28:20Z)
Learning Neural Generative Dynamics for Molecular Conformation Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph. We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
Polymers for Extreme Conditions Designed Using Syntax-Directed Variational Autoencoders [53.34780987686359]
Machine learning tools are now commonly employed to virtually screen material candidates with desired properties. This approach is inefficient, and severely constrained by the candidates that human imagination can conceive. We utilize syntax-directed variational autoencoders (VAE) in tandem with Gaussian process regression (GPR) models to discover polymers expected to be robust under three extreme conditions.
arXiv Detail & Related papers (2020-11-04T21:36:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.