Learning Repetition-Invariant Representations for Polymer Informatics
- URL: http://arxiv.org/abs/2505.10726v1
- Date: Thu, 15 May 2025 22:05:40 GMT
- Title: Learning Repetition-Invariant Representations for Polymer Informatics
- Authors: Yihan Zhu, Gang Liu, Eric Inae, Tengfei Luo, Meng Jiang,
- Abstract summary: We introduce Graph Repetition Invariance (GRIN), a novel method to learn polymer representations that are invariant to the number of repeating units in their graph representations.<n>GRIN integrates a graph-based maximum spanning tree alignment with repeat-unit augmentation to ensure structural consistency.<n>It outperforms state-of-the-art baselines on both homopolymer and copolymer benchmarks, learning stable, repetition-invariant representations that generalize effectively to polymer chains of unseen sizes.
- Score: 15.45788515943579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Polymers are large macromolecules composed of repeating structural units known as monomers and are widely applied in fields such as energy storage, construction, medicine, and aerospace. However, existing graph neural network methods, though effective for small molecules, only model the single unit of polymers and fail to produce consistent vector representations for the true polymer structure with varying numbers of units. To address this challenge, we introduce Graph Repetition Invariance (GRIN), a novel method to learn polymer representations that are invariant to the number of repeating units in their graph representations. GRIN integrates a graph-based maximum spanning tree alignment with repeat-unit augmentation to ensure structural consistency. We provide theoretical guarantees for repetition-invariance from both model and data perspectives, demonstrating that three repeating units are the minimal augmentation required for optimal invariant representation learning. GRIN outperforms state-of-the-art baselines on both homopolymer and copolymer benchmarks, learning stable, repetition-invariant representations that generalize effectively to polymer chains of unseen sizes.
Related papers
- PolySet: Restoring the Statistical Ensemble Nature of Polymers for Machine Learning [0.0]
We introduce PolySet, a framework that represents a polymer as a finite, weighted ensemble of chains sampled from an assumed molar-mass distribution.<n>By explicitly acknowledging the statistical nature of polymer matter, PolySet establishes a physically grounded foundation for future polymer machine learning.
arXiv Detail & Related papers (2025-12-15T10:50:48Z) - Chemistry Integrated Language Model using Hierarchical Molecular Representation for Polymer Informatics [0.0]
Machine learning has transformed material discovery for inorganic compounds and small molecules, yet polymers remain largely inaccessible to these methods.<n>We introduce CI-LLM, a framework combining HAPPY, which encodes chemical substructures as tokens, with numerical descriptors within transformer architectures.<n>For property prediction, De$3$BERTa achieves 3.5x faster inference than SMILES-based models with improved accuracy.<n>For inverse design, our GPT-based generator produces polymers with targeted properties, achieving 100 percent scaffold retention and successful multi-property optimization.
arXiv Detail & Related papers (2025-12-06T05:07:11Z) - Unifying Polymer Modeling and Design via a Conformation-Centric Generative Foundation Model [29.414571977709098]
PolyConFM is a polymer foundation model that unifies polymer modeling and design through conformation-centric pretraining.<n>We construct the first high-quality polymer conformation dataset via molecular dynamics simulations to mitigate data sparsity.<n>Experiments demonstrate that PolyConFM consistently outperforms representative task-specific methods on diverse downstream tasks.
arXiv Detail & Related papers (2025-10-15T17:11:44Z) - MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction [18.637780346409308]
Existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer.<n>We propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers.<n>From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences.
arXiv Detail & Related papers (2025-07-27T15:34:51Z) - Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [66.41802970528133]
Molecular structure elucidation from spectra is a foundational problem in chemistry.<n>Traditional methods rely heavily on expert interpretation and lack scalability.<n>We present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - Inverse Design of Copolymers Including Stoichiometry and Chain Architecture [0.0]
Machine learning-guided molecular design is a promising approach to accelerate polymer discovery.
We develop a novel variational autoencoder architecture encoding a graph and decoding a string.
Our model learns a continuous, well organized latent space that enables de-novo generation of copolymer structures.
arXiv Detail & Related papers (2024-09-30T15:37:39Z) - Pre-training of Molecular GNNs via Conditional Boltzmann Generator [0.0]
We propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations.
We show that our model has a better prediction performance for molecular properties than existing pre-training methods.
arXiv Detail & Related papers (2023-12-20T15:30:15Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Multiresolution Graph Transformers and Wavelet Positional Encoding for
Learning Hierarchical Structures [6.875312133832078]
We propose Multiresolution Graph Transformers (MGT), the first graph transformer architecture that can learn to represent large molecules at multiple scales.
MGT can learn to produce representations for the atoms and group them into meaningful functional groups or repeating units.
Our proposed model achieves results on two macromolecule datasets consisting of polymers and peptides, and one drug-like molecule dataset.
arXiv Detail & Related papers (2023-02-17T01:32:44Z) - A graph representation of molecular ensembles for polymer property
prediction [3.032184156362992]
In contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules.
We introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction.
arXiv Detail & Related papers (2022-05-17T20:31:43Z) - A deep learning driven pseudospectral PCE based FFT homogenization
algorithm for complex microstructures [68.8204255655161]
It is shown that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
It is shown, that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
arXiv Detail & Related papers (2021-10-26T07:02:14Z) - GeoMol: Torsional Geometric Generation of Molecular 3D Conformer
Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
Existing generative models have several drawbacks including lack of modeling important molecular geometry elements.
We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.