Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations
- URL: http://arxiv.org/abs/2507.17980v1
- Date: Wed, 23 Jul 2025 23:02:10 GMT
- Title: Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations
- Authors: Elyar Tourani, Brian J. Edwards, Bamin Khomami,
- Abstract summary: identification of crystallization pathways in polymers is currently carried out using molecular simulation-based data.<n>In this study, an integrated machine learning workflow is presented to accurately quantify crystallinity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, identification of crystallization pathways in polymers is being carried out using molecular simulation-based data on a preset cut-off point on a single order parameter (OP) to define nucleated or crystallized regions. Aside from sensitivity to cut-off, each of these OPs introduces its own systematic biases. In this study, an integrated machine learning workflow is presented to accurately quantify crystallinity in polymeric systems using atomistic molecular dynamics data. Each atom is represented by a high-dimensional feature vector that combines geometric, thermodynamic-like, and symmetry-based descriptors. Low dimensional embeddings are employed to expose latent structural fingerprints within atomic environments. Subsequently, unsupervised clustering on the embeddings identified crystalline and amorphous atoms with high fidelity. After generating high quality labels with multidimensional data, we use supervised learning techniques to identify a minimal set of order parameters that can fully capture this label. Various tests were conducted to reduce the feature set, demonstrating that using only three order parameters is sufficient to recreate the crystallization labels. Based on these observed OPs, the crystallinity index (C-index) is defined as the logistic regression model's probability of crystallinity, remaining bimodal throughout the process and achieving over 0.98 classification performance (AUC). Notably, a model trained on one or a few snapshots enables efficient on-the-fly computation of crystallinity. Lastly, we demonstrate how the optimal C-index fit evolves during various stages of crystallization, supporting the hypothesis that entropy dominates early nucleation, while symmetry gains relevance later. This workflow provides a data-driven strategy for OP selection and a metric to monitor structural transformations in large-scale polymer simulations.
Related papers
- Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z) - Towards Space Group Determination from EBSD Patterns: The Role of Deep Learning and High-throughput Dynamical Simulations [0.7154115167845776]
Deep learning methods may be able to classify the space group symmetries using the patterns as input.<n>Neural networks were trained to predict the space group type of background corrected EBSD patterns.<n>We introduce a relabeling scheme, which enables our models to achieve accuracy scores higher than 90% on simulated and experimental data.
arXiv Detail & Related papers (2025-04-30T05:36:31Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Efficient Probabilistic Modeling of Crystallization at Mesoscopic Scale [4.271235935891555]
Crystallization processes at the mesoscopic scale are of particular interest in materials science and metallurgy.
We introduce the Crystal Growth Neural Emulator (CGNE), a probabilistic model for efficient crystal growth at the mesoscopic scale.
CGNE delivers a factor of 11 improvement in inference time and performance gains compared with recent state-of-the-art probabilistic models for dynamical systems.
arXiv Detail & Related papers (2024-05-26T15:37:19Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Data-free Weight Compress and Denoise for Large Language Models [96.68582094536032]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.<n>We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - Latent Conservative Objective Models for Data-Driven Crystal Structure
Prediction [62.36797874900395]
In computational chemistry, crystal structure prediction is an optimization problem.
One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation.
We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction.
arXiv Detail & Related papers (2023-10-16T04:35:44Z) - Data-Driven Score-Based Models for Generating Stable Structures with
Adaptive Crystal Cells [1.515687944002438]
This work aims at the generation of new crystal structures with desired properties, such as chemical stability and specified chemical composition.
The novelty of the presented approach resides in the fact that the lattice of the crystal cell is not fixed.
A multigraph crystal representation is introduced that respects symmetry constraints, yielding computational advantages.
arXiv Detail & Related papers (2023-10-16T02:53:24Z) - Crystal Structure Prediction by Joint Equivariant Diffusion [27.52168842448489]
Crystal Structure Prediction (CSP) is crucial in various scientific disciplines.
This paper proposes DiffCSP, a novel diffusion model to learn the structure distribution from stable crystals.
arXiv Detail & Related papers (2023-07-30T15:46:33Z) - Equivariant Parameter Sharing for Porous Crystalline Materials [4.271235935891555]
Existing methods for crystal property prediction either have constraints that are too restrictive or only incorporate symmetries between unit cells.
We develop a model which incorporates the symmetries of the unit cell of a crystal in its architecture and explicitly models the porous structure.
Our results confirm that our method performs better than existing methods for crystal property prediction and that the inclusion of symmetry results in a more efficient model.
arXiv Detail & Related papers (2023-04-04T08:33:13Z) - A Score-based Geometric Model for Molecular Dynamics Simulations [33.158796937777886]
We propose a novel model called ScoreMD to estimate the gradient of the log density of molecular conformations.
With multiple architectural improvements, we outperforms state-of-the-art baselines on MD17 and isomers of C7O2H10.
This research provides new insights into the acceleration of new material and drug discovery.
arXiv Detail & Related papers (2022-04-19T05:13:46Z) - Disentangling multiple scattering with deep learning: application to
strain mapping from electron diffraction patterns [48.53244254413104]
We implement a deep neural network called FCU-Net to invert highly nonlinear electron diffraction patterns into quantitative structure factor images.
We trained the FCU-Net using over 200,000 unique dynamical diffraction patterns which include many different combinations of crystal structures.
Our simulated diffraction pattern library, implementation of FCU-Net, and trained model weights are freely available in open source repositories.
arXiv Detail & Related papers (2022-02-01T03:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.