Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation
- URL: http://arxiv.org/abs/2503.00152v1
- Date: Fri, 28 Feb 2025 20:02:53 GMT
- Title: Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation
- Authors: Keqiang Yan, Xiner Li, Hongyi Ling, Kenna Ashen, Carl Edwards, Raymundo Arróyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji,
- Abstract summary: Key step is to convert 3D crystal structures into 1D sequences to be processed by language models (LMs)<n>Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence.<n> Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.
- Score: 82.91073155506277
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence, thereby provably achieving SE(3) and periodic invariance. Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.
Related papers
- CrystalGRW: Generative Modeling of Crystal Structures with Targeted Properties via Geodesic Random Walks [1.2141052067494946]
We introduce CrystalGRW, a diffusion-based generative model that can predict stable phases validated by density functional theory.<n>CrystalGRW demonstrates the ability to generate realistic crystal structures that are close to their ground states with accuracy comparable to existing models.<n>These features help accelerate materials discovery and inverse design by offering stable, symmetry-consistent crystal candidates for experimental validation.
arXiv Detail & Related papers (2025-01-15T18:26:35Z) - Generative Hierarchical Materials Search [91.93125016916463]
We propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures.
GenMS consists of (1) a language model that takes high-level natural language as input and generates intermediate textual information about a crystal.
GenMS additionally uses a graph neural network to predict properties (e.g., formation energy) from the generated crystal structures.
arXiv Detail & Related papers (2024-09-10T17:51:28Z) - Complete and Efficient Graph Transformers for Crystal Material Property Prediction [53.32754046881189]
Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space.
We introduce a novel approach that utilizes the periodic patterns of unit cells to establish the lattice-based representation for each atom.
We propose ComFormer, a SE(3) transformer designed specifically for crystalline materials.
arXiv Detail & Related papers (2024-03-18T15:06:37Z) - Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat)
UniMat can generate high fidelity crystal structures from larger and more complex chemical systems.
We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z) - Latent Conservative Objective Models for Data-Driven Crystal Structure
Prediction [62.36797874900395]
In computational chemistry, crystal structure prediction is an optimization problem.
One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation.
We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction.
arXiv Detail & Related papers (2023-10-16T04:35:44Z) - Data-Driven Score-Based Models for Generating Stable Structures with
Adaptive Crystal Cells [1.515687944002438]
This work aims at the generation of new crystal structures with desired properties, such as chemical stability and specified chemical composition.
The novelty of the presented approach resides in the fact that the lattice of the crystal cell is not fixed.
A multigraph crystal representation is introduced that respects symmetry constraints, yielding computational advantages.
arXiv Detail & Related papers (2023-10-16T02:53:24Z) - Crystal-GFN: sampling crystals with desirable properties and constraints [103.79058968784163]
We introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials.
In this paper, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench.
The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.
arXiv Detail & Related papers (2023-10-07T21:36:55Z) - Crystal Structure Prediction by Joint Equivariant Diffusion [27.52168842448489]
Crystal Structure Prediction (CSP) is crucial in various scientific disciplines.
This paper proposes DiffCSP, a novel diffusion model to learn the structure distribution from stable crystals.
arXiv Detail & Related papers (2023-07-30T15:46:33Z) - Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model
for Protein Design [70.27706384570723]
We propose Fold2Seq, a novel framework for designing protein sequences conditioned on a specific target fold.
We show improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design.
The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges.
arXiv Detail & Related papers (2021-06-24T14:34:24Z) - Quantum metamorphism [0.0]
We propose a model for quantum metamorphism between two DTCs of different periodicity, a 2T and 4T-DTC.
Conditions for metamorphism come from the modulation of perturbative terms in the 4T-DTC Hamiltonian that gradually melt its structure and transform it into a 2T-DTC.
We also propose a protocol to experimentally observe quantum metamorphism using current quantum technology.
arXiv Detail & Related papers (2020-11-04T04:01:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.