polyGen: A Learning Framework for Atomic-level Polymer Structure Generation
- URL: http://arxiv.org/abs/2504.17656v1
- Date: Thu, 24 Apr 2025 15:26:00 GMT
- Title: polyGen: A Learning Framework for Atomic-level Polymer Structure Generation
- Authors: Ayush Jain, Rampi Ramprasad,
- Abstract summary: We introduce polyGen, the first latent diffusion model designed specifically to generate realistic polymer structures from minimal inputs.<n> polyGen effectively generates diverse conformations of both linear chains and complex branched structures, though its performance decreases when handling repeat units with a high atom count.
- Score: 4.6516580885528835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic polymeric materials underpin fundamental technologies in the energy, electronics, consumer goods, and medical sectors, yet their development still suffers from prolonged design timelines. Although polymer informatics tools have supported speedup, polymer simulation protocols continue to face significant challenges: on-demand generation of realistic 3D atomic structures that respect the conformational diversity of polymer structures. Generative algorithms for 3D structures of inorganic crystals, bio-polymers, and small molecules exist, but have not addressed synthetic polymers. In this work, we introduce polyGen, the first latent diffusion model designed specifically to generate realistic polymer structures from minimal inputs such as the repeat unit chemistry alone, leveraging a molecular encoding that captures polymer connectivity throughout the architecture. Due to a scarce dataset of only 3855 DFT-optimized polymer structures, we augment our training with DFT-optimized molecular structures, showing improvement in joint learning between similar chemical structures. We also establish structure matching criteria to benchmark our approach on this novel problem. polyGen effectively generates diverse conformations of both linear chains and complex branched structures, though its performance decreases when handling repeat units with a high atom count. Given these initial results, polyGen represents a paradigm shift in atomic-level structure generation for polymer science-the first proof-of-concept for predicting realistic atomic-level polymer conformations while accounting for their intrinsic structural flexibility.
Related papers
- PolyConf: Unlocking Polymer Conformation Generation through Hierarchical Generative Models [28.480039088875635]
PolyConf is a pioneering tailored polymer conformation generation method.<n>We decompose the polymer conformation into a series of local conformations, generating these local conformations through an autoregressive model.<n>We then generate corresponding orientation transformations via a diffusion model to assemble these local conformations into the complete polymer conformation.
arXiv Detail & Related papers (2025-04-11T07:12:02Z) - Multimodal machine learning with large language embedding model for polymer property prediction [2.525624865489335]
We propose a simple yet effective multimodal architecture, PolyLLMem, for polymer properties prediction tasks.<n>PolyLLMem integrates text embeddings generated by Llama 3 with molecular structure embeddings derived from Uni-Mol.<n>Its performance is comparable to, and in some cases exceeds, that of graph-based models, as well as transformer-based models.
arXiv Detail & Related papers (2025-03-29T03:48:11Z) - GraphXForm: Graph transformer for computer-aided molecular design [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds.<n>We evaluate it on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Inverse Design of Copolymers Including Stoichiometry and Chain Architecture [0.0]
Machine learning-guided molecular design is a promising approach to accelerate polymer discovery.
We develop a novel variational autoencoder architecture encoding a graph and decoding a string.
Our model learns a continuous, well organized latent space that enables de-novo generation of copolymer structures.
arXiv Detail & Related papers (2024-09-30T15:37:39Z) - MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction [24.975491375575224]
MMPolymer is a novel multitask pretraining framework incorporating polymer 1D sequential and 3D structural information.
MMPolymer achieves state-of-the-art performance in downstream property prediction tasks.
arXiv Detail & Related papers (2024-06-07T08:19:59Z) - UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules.
Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z) - Compositional Representation of Polymorphic Crystalline Materials [56.80318252233511]
We introduce PCRL, a novel approach that employs probabilistic modeling of composition to capture the diverse polymorphs from available structural information.<n>Extensive evaluations on sixteen datasets demonstrate the effectiveness of PCRL in learning compositional representation.
arXiv Detail & Related papers (2023-11-17T20:34:28Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - A graph representation of molecular ensembles for polymer property
prediction [3.032184156362992]
In contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules.
We introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction.
arXiv Detail & Related papers (2022-05-17T20:31:43Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.