OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
- URL: http://arxiv.org/abs/2512.06987v1
- Date: Sun, 07 Dec 2025 20:46:30 GMT
- Title: OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
- Authors: Emily Jin, Andrei Cristian Nica, Mikhail Galkin, Jarrid Rector-Brooks, Kin Long Kelvin Lee, Santiago Miret, Frances H. Arnold, Michael Bronstein, Avishek Joey Bose, Alexander Tong, Cheng-Hao Liu,
- Abstract summary: We introduce OXtal, a large-scale 100M parameter all-atom diffusion model that learns the conditional joint distribution over intramolecular conformations and periodic packing.<n>By leveraging a large dataset of 600K experimentally validated crystal structures, OXtal achieves orders-of-improvement over prior ab initio machine learning CSP methods.<n> OXtal attains over 80% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
- Score: 63.318434943975255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurately predicting experimentally-realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal structure prediction (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce OXtal, a large-scale 100M parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale OXtal, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, Stoichiometric Stochastic Shell Sampling ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization -- thus enabling more scalable architectural choices at all-atom resolution. By leveraging a large dataset of 600K experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), OXtal achieves orders-of-magnitude improvements over prior ab initio machine learning CSP methods, while remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, OXtal recovers experimental structures with conformer $\text{RMSD}_1<0.5$ Å and attains over 80\% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
Related papers
- OrgFlow: Generative Modeling of Organic Crystal Structures from Molecular Graphs [4.5375644408112565]
We introduce a flow-matching model for predicting organic crystal structures directly from molecular graphs.<n>A bond-aware loss guides the model toward realistic local chemistry by enforcing distributions of bond lengths and connectivity.<n>Experiments show that our method achieves a Match Rate more than 10 times higher than existing baselines.
arXiv Detail & Related papers (2026-02-22T04:01:06Z) - Coherence Dispersion and Temperature Scales in a Quantum-Biology Toy Model [51.56484100374058]
We investigate how quantum coherence can scatter among the several off-diagonal elements of an arbitrary quantum state.<n>By focusing on out-of-equilibrium systems, we use the developed framework to address a simplified model of cellular energetics.
arXiv Detail & Related papers (2025-12-13T14:21:34Z) - Foundation Models for Discovery and Exploration in Chemical Space [57.97784111110166]
MIST is a family of molecular foundation models trained on large unlabeled datasets.<n>We demonstrate the ability of these models to solve real-world problems across chemical space.
arXiv Detail & Related papers (2025-10-20T17:56:01Z) - Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations [0.0]
identification of crystallization pathways in polymers is currently carried out using molecular simulation-based data.<n>In this study, an integrated machine learning workflow is presented to accurately quantify crystallinity.
arXiv Detail & Related papers (2025-07-23T23:02:10Z) - End-to-End Crystal Structure Prediction from Powder X-Ray Diffraction [37.563382606039006]
This study introduces XtalNet, the first equivariant deep generative model for end-to-end crystal structure prediction from PXRD.<n>XtalNet leverages PXRD as an additional condition, eliminating ambiguity and enabling the generation of complex organic structures with up to 400 atoms in the unit cell.<n>XtalNet achieves a top-10 Match Rate of 90.2% and 79% for hMOF-100 and hMOF-400 in conditional crystal structure prediction task, respectively.
arXiv Detail & Related papers (2024-01-08T12:50:17Z) - Latent Conservative Objective Models for Data-Driven Crystal Structure
Prediction [62.36797874900395]
In computational chemistry, crystal structure prediction is an optimization problem.
One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation.
We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction.
arXiv Detail & Related papers (2023-10-16T04:35:44Z) - Data-Driven Score-Based Models for Generating Stable Structures with
Adaptive Crystal Cells [1.515687944002438]
This work aims at the generation of new crystal structures with desired properties, such as chemical stability and specified chemical composition.
The novelty of the presented approach resides in the fact that the lattice of the crystal cell is not fixed.
A multigraph crystal representation is introduced that respects symmetry constraints, yielding computational advantages.
arXiv Detail & Related papers (2023-10-16T02:53:24Z) - Crystal-GFN: sampling crystals with desirable properties and constraints [103.79058968784163]
We introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials.
In this paper, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench.
The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.
arXiv Detail & Related papers (2023-10-07T21:36:55Z) - Crystal Structure Prediction by Joint Equivariant Diffusion [27.52168842448489]
Crystal Structure Prediction (CSP) is crucial in various scientific disciplines.
This paper proposes DiffCSP, a novel diffusion model to learn the structure distribution from stable crystals.
arXiv Detail & Related papers (2023-07-30T15:46:33Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.