CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning
- URL: http://arxiv.org/abs/2506.17345v1
- Date: Thu, 19 Jun 2025 15:45:24 GMT
- Title: CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning
- Authors: Changwen Xu, Shang Zhu, Venkatasubramanian Viswanathan,
- Abstract summary: We introduce CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling), a transformer-based framework trained on a novel Symmetry-Consistent SCOPE (SCOPE)<n>CLOUD is pre-trained on over six million crystal structures and achieves competitive performance in predicting a wide range of material properties.<n>As proof of concept of differentiable materials modeling, CLOUD is applied to predict the phonon internal energy and heat capacity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prediction of crystal properties is essential for understanding structure-property relationships and accelerating the discovery of functional materials. However, conventional approaches relying on experimental measurements or density functional theory (DFT) calculations are often resource-intensive, limiting their scalability. Machine learning (ML) models offer a promising alternative by learning complex structure-property relationships from data, enabling faster predictions. Yet, existing ML models often rely on labeled data, adopt representations that poorly capture essential structural characteristics, and lack integration with physical principles--factors that limit their generalizability and interpretability. Here, we introduce CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling), a transformer-based framework trained on a novel Symmetry-Consistent Ordered Parameter Encoding (SCOPE) that encodes crystal symmetry, Wyckoff positions, and composition in a compact, coordinate-free string representation. Pre-trained on over six million crystal structures, CLOUD is fine-tuned on multiple downstream tasks and achieves competitive performance in predicting a wide range of material properties, demonstrating strong scaling performance. Furthermore, as proof of concept of differentiable materials modeling, CLOUD is applied to predict the phonon internal energy and heat capacity, which integrates the Debye model to preserve thermodynamic consistency. The CLOUD-DEBYE framework enforces thermodynamic consistency and enables temperature-dependent property prediction without requiring additional data. These results demonstrate the potential of CLOUD as a scalable and physics-informed foundation model for crystalline materials, unifying symmetry-consistent representations with physically grounded learning for property prediction and materials discovery.
Related papers
- Explainable Prediction of the Mechanical Properties of Composites with CNNs [11.662972025976192]
We show that convolutional neural networks (CNNs) equipped with methods from explainable AI (XAI) can be successfully deployed to solve this problem.<n>Our approach uses customised CNNs trained on a dataset we generate using transverse tension tests to predict composites' mechanical properties.<n>We then use SHAP and Integrated Gradients, two post-hoc XAI methods, to explain the predictions, showing that the CNNs use the critical geometrical features that influence the composites' behaviour.
arXiv Detail & Related papers (2025-05-20T08:54:06Z) - Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.<n>Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)<n>By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z) - UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.<n>We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.<n>We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - CrysAtom: Distributed Representation of Atoms for Crystal Property Prediction [0.0]
In material science literature, it is well-known that crystalline materials exhibit topological structures.
In this paper, we propose an unsupervised framework namely, CrysAtom, using untagged crystal data to generate dense vector representation of atoms.
arXiv Detail & Related papers (2024-09-07T06:58:55Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [57.01994216693825]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.
We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.
Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat)
UniMat can generate high fidelity crystal structures from larger and more complex chemical systems.
We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z) - Latent Conservative Objective Models for Data-Driven Crystal Structure
Prediction [62.36797874900395]
In computational chemistry, crystal structure prediction is an optimization problem.
One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation.
We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction.
arXiv Detail & Related papers (2023-10-16T04:35:44Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Thermodynamics-based Artificial Neural Networks (TANN) for multiscale
modeling of materials with inelastic microstructure [0.0]
Multiscale, homogenization approaches are often used for performing reliable, accurate predictions of the macroscopic mechanical behavior of inelastic materials.
Data-driven approaches based on deep learning have risen as a promising alternative to replace ad-hoc laws and speed-up numerical methods.
Here, we propose Thermodynamics-based Artificial Neural Networks (TANN) for the modeling of mechanical materials with inelastic and complex microstructure.
arXiv Detail & Related papers (2021-08-30T11:50:38Z) - ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on
Nonlinear ICA [11.919315372249802]
We consider the identifiability theory of probabilistic models.
We show that our model can be used for the estimation of the components in the framework of Independently Modulated Component Analysis.
arXiv Detail & Related papers (2020-02-26T14:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.