PolySet: Restoring the Statistical Ensemble Nature of Polymers for Machine Learning
- URL: http://arxiv.org/abs/2512.13186v1
- Date: Mon, 15 Dec 2025 10:50:48 GMT
- Title: PolySet: Restoring the Statistical Ensemble Nature of Polymers for Machine Learning
- Authors: Khalid Ferji,
- Abstract summary: We introduce PolySet, a framework that represents a polymer as a finite, weighted ensemble of chains sampled from an assumed molar-mass distribution.<n>By explicitly acknowledging the statistical nature of polymer matter, PolySet establishes a physically grounded foundation for future polymer machine learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine-learning (ML) models in polymer science typically treat a polymer as a single, perfectly defined molecular graph, even though real materials consist of stochastic ensembles of chains with distributed lengths. This mismatch between physical reality and digital representation limits the ability of current models to capture polymer behaviour. Here we introduce PolySet, a framework that represents a polymer as a finite, weighted ensemble of chains sampled from an assumed molar-mass distribution. This ensemble-based encoding is independent of chemical detail, compatible with any molecular representation and illustrated here in the homopolymer case using a minimal language model. We show that PolySet retains higher-order distributional moments (such as Mz, Mz+1), enabling ML models to learn tail-sensitive properties with greatly improved stability and accuracy. By explicitly acknowledging the statistical nature of polymer matter, PolySet establishes a physically grounded foundation for future polymer machine learning, naturally extensible to copolymers, block architectures, and other complex topologies.
Related papers
- Molecular Representations in Implicit Functional Space via Hyper-Networks [53.70982267248536]
We argue that molecular learning can instead be formulated as learning in function space.<n>We instantiate this formulation with MolField, a hyper-network-based framework that learns distributions over molecular fields.<n>Our results show that treating molecules as continuous functions fundamentally changes how molecular representations generalize across tasks.
arXiv Detail & Related papers (2026-01-29T21:13:37Z) - Unifying Polymer Modeling and Design via a Conformation-Centric Generative Foundation Model [29.414571977709098]
PolyConFM is a polymer foundation model that unifies polymer modeling and design through conformation-centric pretraining.<n>We construct the first high-quality polymer conformation dataset via molecular dynamics simulations to mitigate data sparsity.<n>Experiments demonstrate that PolyConFM consistently outperforms representative task-specific methods on diverse downstream tasks.
arXiv Detail & Related papers (2025-10-15T17:11:44Z) - MIPS: a Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction [18.637780346409308]
Existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer.<n>We propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers.<n>From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences.
arXiv Detail & Related papers (2025-07-27T15:34:51Z) - POINT$^{2}$: A Polymer Informatics Training and Testing Database [15.45788515943579]
POINT$2$ (POlymer INformatics Training and Testing) is a benchmark database and protocol designed to address critical challenges in polymer informatics.<n>We develop an ensemble of ML models, including Quantile Random Forests, Multilayer Perceptrons with dropout, Graph Neural Networks, and pretrained large language models.<n>These models are coupled with diverse polymer representations such as Morgan, MACCS, RDKit, Topological, Atom Pair fingerprints, and graph-based descriptors.
arXiv Detail & Related papers (2025-03-30T15:46:01Z) - Multimodal machine learning with large language embedding model for polymer property prediction [2.525624865489335]
We propose a simple yet effective multimodal architecture, PolyLLMem, for polymer properties prediction tasks.<n>PolyLLMem integrates text embeddings generated by Llama 3 with molecular structure embeddings derived from Uni-Mol.<n>Its performance is comparable to, and in some cases exceeds, that of graph-based models, as well as transformer-based models.
arXiv Detail & Related papers (2025-03-29T03:48:11Z) - Compositional Representation of Polymorphic Crystalline Materials [56.80318252233511]
We introduce PCRL, a novel approach that employs probabilistic modeling of composition to capture the diverse polymorphs from available structural information.<n>Extensive evaluations on sixteen datasets demonstrate the effectiveness of PCRL in learning compositional representation.
arXiv Detail & Related papers (2023-11-17T20:34:28Z) - Automatically Predict Material Properties with Microscopic Image Example
Polymer Compatibility [94.40113383292139]
Computer image recognition with machine learning method can make up the defects of artificial judging.
We achieve automatic miscibility recognition utilizing convolution neural network and transfer learning method.
The proposed method can be widely applied to the quantitative characterization of the microstructure and properties of various materials.
arXiv Detail & Related papers (2023-03-22T07:51:32Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - A graph representation of molecular ensembles for polymer property
prediction [3.032184156362992]
In contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules.
We introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction.
arXiv Detail & Related papers (2022-05-17T20:31:43Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.