MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction
- URL: http://arxiv.org/abs/2507.20326v1
- Date: Sun, 27 Jul 2025 15:34:51 GMT
- Title: MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction
- Authors: Jiaxi Wang, Yaosen Min, Xun Zhu, Miao Li, Ji Wu,
- Abstract summary: Existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer.<n>We propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers.<n>From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences.
- Score: 18.637780346409308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Polymers, composed of repeating structural units called monomers, are fundamental materials in daily life and industry. Accurate property prediction for polymers is essential for their design, development, and application. However, existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer, since the properties change during the polymerization process. In this study, we propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers and integrates both topological and spatial information for comprehensive modeling. From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences. For MPM, we demonstrate that applying MPM to infinite polymer sequences is equivalent to applying MPM on the induced star-linking graph of monomers. For GAM, we propose to further replace global graph attention with localized graph attention (LGA). Moreover, we show the robustness of the "star linking" strategy through Repeat and Shift Invariance Test (RSIT). Despite its robustness, "star linking" strategy exhibits limitations when monomer side chains contain ring structures, a common characteristic of polymers, as it fails the Weisfeiler-Lehman~(WL) test. To overcome this issue, we propose backbone embedding to enhance the capability of MPM and LGA on infinite polymer sequences. From the spatial perspective, we extract 3D descriptors of repeating monomers to capture spatial information. Finally, we design a cross-modal fusion mechanism to unify the topological and spatial information. Experimental validation across eight diverse polymer property prediction tasks reveals that MIPS achieves state-of-the-art performance.
Related papers
- CryoGS: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction [55.2480439325792]
cryogenic electron microscopy (cryo-EM) facilitates the determination of macromolecular structures at near-atomic resolution.<n>The core computational task in single-particle cryo-EM is to reconstruct the 3D electrostatic potential of a molecule.<n>We introduce cryoGS, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation.
arXiv Detail & Related papers (2025-08-06T23:24:43Z) - Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [66.41802970528133]
Molecular structure elucidation from spectra is a foundational problem in chemistry.<n>Traditional methods rely heavily on expert interpretation and lack scalability.<n>We present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - Learning Repetition-Invariant Representations for Polymer Informatics [15.45788515943579]
We introduce Graph Repetition Invariance (GRIN), a novel method to learn polymer representations that are invariant to the number of repeating units in their graph representations.<n>GRIN integrates a graph-based maximum spanning tree alignment with repeat-unit augmentation to ensure structural consistency.<n>It outperforms state-of-the-art baselines on both homopolymer and copolymer benchmarks, learning stable, repetition-invariant representations that generalize effectively to polymer chains of unseen sizes.
arXiv Detail & Related papers (2025-05-15T22:05:40Z) - POINT$^{2}$: A Polymer Informatics Training and Testing Database [15.45788515943579]
POINT$2$ (POlymer INformatics Training and Testing) is a benchmark database and protocol designed to address critical challenges in polymer informatics.<n>We develop an ensemble of ML models, including Quantile Random Forests, Multilayer Perceptrons with dropout, Graph Neural Networks, and pretrained large language models.<n>These models are coupled with diverse polymer representations such as Morgan, MACCS, RDKit, Topological, Atom Pair fingerprints, and graph-based descriptors.
arXiv Detail & Related papers (2025-03-30T15:46:01Z) - Multimodal machine learning with large language embedding model for polymer property prediction [2.525624865489335]
We propose a simple yet effective multimodal architecture, PolyLLMem, for polymer properties prediction tasks.<n>PolyLLMem integrates text embeddings generated by Llama 3 with molecular structure embeddings derived from Uni-Mol.<n>Its performance is comparable to, and in some cases exceeds, that of graph-based models, as well as transformer-based models.
arXiv Detail & Related papers (2025-03-29T03:48:11Z) - MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction [24.975491375575224]
MMPolymer is a novel multitask pretraining framework incorporating polymer 1D sequential and 3D structural information.
MMPolymer achieves state-of-the-art performance in downstream property prediction tasks.
arXiv Detail & Related papers (2024-06-07T08:19:59Z) - E(3)-equivariant models cannot learn chirality: Field-based molecular generation [51.327048911864885]
Chirality plays a key role in determining drug safety and potency.<n>We introduce a novel field-based representation, proposing reference rotations that replace rotational symmetry constraints.<n>The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics.
arXiv Detail & Related papers (2024-02-24T17:13:58Z) - Gaussian Entanglement Measure: Applications to Multipartite Entanglement
of Graph States and Bosonic Field Theory [50.24983453990065]
An entanglement measure based on the Fubini-Study metric has been recently introduced by Cocchiarella and co-workers.
We present the Gaussian Entanglement Measure (GEM), a generalization of geometric entanglement measure for multimode Gaussian states.
By providing a computable multipartite entanglement measure for systems with a large number of degrees of freedom, we show that our definition can be used to obtain insights into a free bosonic field theory.
arXiv Detail & Related papers (2024-01-31T15:50:50Z) - Photonic Quantum Computing For Polymer Classification [62.997667081978825]
Two polymer classes visual (VIS) and near-infrared (NIR) are defined based on the size of the polymer gaps.
We present a hybrid classical-quantum approach to the binary classification of polymer structures.
arXiv Detail & Related papers (2022-11-22T11:59:52Z) - Representing Polymers as Periodic Graphs with Learned Descriptors for
Accurate Polymer Property Predictions [16.468017785818198]
We develop a periodic polymer graph representation that consistently outperforms hand-designed representations.
We also demonstrate how combining polymer graph representations with message-passing neural network architectures can automatically extract meaningful polymer features.
arXiv Detail & Related papers (2022-05-27T04:14:12Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.