Lang2Str: Two-Stage Crystal Structure Generation with LLMs and Continuous Flow Models
- URL: http://arxiv.org/abs/2603.03946v1
- Date: Wed, 04 Mar 2026 11:14:01 GMT
- Title: Lang2Str: Two-Stage Crystal Structure Generation with LLMs and Continuous Flow Models
- Authors: Cong Liu, Chengyue Gong, Zhenyu Liu, Jiale Zhao, Yuxuan Zhang,
- Abstract summary: We propose a two-stage generative framework, Lang2Str, for flexible and precise material generation.<n>Our method frames the generative process as a conditional generative task.<n>We show that our method achieves competitive performance on textitab initio material generation and crystal structure prediction tasks.
- Score: 22.830348732529334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models hold great promise for accelerating material discovery but are often limited by their inflexible single-stage generative process in designing valid and diverse materials. To address this, we propose a two-stage generative framework, Lang2Str, that combines the strengths of large language models (LLMs) and flow-based models for flexible and precise material generation. Our method frames the generative process as a conditional generative task, where an LLM provides high-level conditions by generating descriptions of material unit cells' geometric layouts and properties. These descriptions, informed by the LLM's extensive background knowledge, ensure reasonable structure designs. A conditioned flow model then decodes these textual conditions into precise continuous coordinates and unit cell parameters. This staged approach combines the structured reasoning of LLMs and the distribution modeling capabilities of flow models. Experimental results show that our method achieves competitive performance on \textit{ab initio} material generation and crystal structure prediction tasks, with generated structures exhibiting closer alignment to ground truth in both geometry and energy levels, surpassing state-of-the-art models. The flexibility and modularity of our framework further enable fine-grained control over the generation process, potentially leading to more efficient and customizable material design.
Related papers
- LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation [25.009649087291432]
CrysLLMGen is a hybrid framework that integrates an LLM with a diffusion model to leverage their complementary strengths for crystal material generation.<n>Our framework outperforms state-of-the-art generative models across several benchmark tasks and datasets.
arXiv Detail & Related papers (2025-10-27T06:08:19Z) - Speed Always Wins: A Survey on Efficient Architectures for Large Language Models [51.817121227562964]
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models.<n> Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties.<n>The traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment.
arXiv Detail & Related papers (2025-08-13T14:13:46Z) - MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models [5.417632175667162]
Metal-Organic Frameworks (MOFs) with application-specific properties remain a central challenge in materials chemistry.<n>We present a reinforcement learning-enhanced, transformer-based framework for the de novo design of MOFs.<n>By integrating property feedback into sequence generation, our method drives the model toward synthesizable, topologically valid MOFs.
arXiv Detail & Related papers (2025-05-30T20:09:11Z) - MatLLMSearch: Crystal Structure Discovery with Evolution-Guided Large Language Models [27.083255538087215]
We show that pre-trained Large Language Models (LLMs) can inherently generate novel and stable crystal structures without additional fine-tuning.<n>Our framework employs LLMs as intelligent proposal agents within an evolutionary pipeline that guides them to perform implicit crossover and mutation operations.<n>We demonstrate that MatLLMSearch achieves a 78.38% metastable rate validated by machine learning interatomic potentials and 31.7% DFT-verified stability.
arXiv Detail & Related papers (2025-02-28T10:41:16Z) - Open Materials Generation with Stochastic Interpolants [14.939468363546384]
Open Materials Generation (OMatG) is a unifying framework for the generative design and discovery of crystalline materials.<n>OMatG employs inorganic interpolants to bridge an arbitrary base distribution to the target distribution of inorganic crystals.<n>We benchmark OMatG's performance on two tasks: Crystal Structure Prediction and 'de novo' generation.
arXiv Detail & Related papers (2025-02-04T18:56:47Z) - Efficient Symmetry-Aware Materials Generation via Hierarchical Generative Flow Networks [52.13486402193811]
New solid-state materials require rapidly exploring the vast space of crystal structures and locating stable regions.
Existing methods struggle to explore large material spaces and generate diverse samples with desired properties and requirements.
We propose a novel generative model employing a hierarchical exploration strategy to efficiently exploit the symmetry of the materials space to generate crystal structures given desired properties.
arXiv Detail & Related papers (2024-11-06T23:53:34Z) - Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [53.81190146434045]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.<n>We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.<n>Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat)
UniMat can generate high fidelity crystal structures from larger and more complex chemical systems.
We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.