MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models
- URL: http://arxiv.org/abs/2506.00198v1
- Date: Fri, 30 May 2025 20:09:11 GMT
- Title: MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models
- Authors: Srivathsan Badrinarayanan, Rishikesh Magar, Akshay Antony, Radheesh Sharma Meda, Amir Barati Farimani,
- Abstract summary: Metal-Organic Frameworks (MOFs) with application-specific properties remain a central challenge in materials chemistry.<n>We present a reinforcement learning-enhanced, transformer-based framework for the de novo design of MOFs.<n>By integrating property feedback into sequence generation, our method drives the model toward synthesizable, topologically valid MOFs.
- Score: 5.417632175667162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The discovery of Metal-Organic Frameworks (MOFs) with application-specific properties remains a central challenge in materials chemistry, owing to the immense size and complexity of their structural design space. Conventional computational screening techniques such as molecular simulations and density functional theory (DFT), while accurate, are computationally prohibitive at scale. Machine learning offers an exciting alternative by leveraging data-driven approaches to accelerate materials discovery. The complexity of MOFs, with their extended periodic structures and diverse topologies, creates both opportunities and challenges for generative modeling approaches. To address these challenges, we present a reinforcement learning-enhanced, transformer-based framework for the de novo design of MOFs. Central to our approach is MOFid, a chemically-informed string representation encoding both connectivity and topology, enabling scalable generative modeling. Our pipeline comprises three components: (1) a generative GPT model trained on MOFid sequences, (2) MOFormer, a transformer-based property predictor, and (3) a reinforcement learning (RL) module that optimizes generated candidates via property-guided reward functions. By integrating property feedback into sequence generation, our method drives the model toward synthesizable, topologically valid MOFs with desired functional attributes. This work demonstrates the potential of large language models, when coupled with reinforcement learning, to accelerate inverse design in reticular chemistry and unlock new frontiers in computational MOF discovery.
Related papers
- MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space [0.0]
MoDyGAN is a pipeline that exploits molecular dynamics simulations and generative adversarial networks (GANs) to explore protein conformational spaces.<n>MoDyGAN contains a generator that maps Gaussian distributions into MD-derived protein trajectories, and a refinement module that combines ensemble learning with a dual-discriminator.<n>Central to our approach is an innovative representation technique that reversibly transforms 3D protein structures into 2D matrices.<n>Our results suggest that representing proteins as image-like data unlocks new possibilities for applying advanced deep learning techniques to biomolecular simulation.
arXiv Detail & Related papers (2025-07-18T14:18:28Z) - Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate [0.0]
This paper explores an alternative, constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings.<n>We show that specialist models trained on disparate datasets can be merged into a single, more capable Mixture-of-Experts model.<n>We introduce a layer-wise constructive training methodology, where a deep Transformer is "grown" by progressively stacking and training one layer at a time.
arXiv Detail & Related papers (2025-07-08T20:01:15Z) - Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - Improving Molecular Modeling with Geometric GNNs: an Empirical Study [56.52346265722167]
This paper focuses on the impact of different canonicalization methods, (2) graph creation strategies, and (3) auxiliary tasks, on performance, scalability and symmetry enforcement.
Our findings and insights aim to guide researchers in selecting optimal modeling components for molecular modeling tasks.
arXiv Detail & Related papers (2024-07-11T09:04:12Z) - GeoMFormer: A General Architecture for Geometric Molecular Representation Learning [84.02083170392764]
We introduce a novel Transformer-based molecular model called GeoMFormer to achieve this goal.
We show that GeoMFormer achieves strong performance on both invariant and equivariant tasks of different types and scales.
arXiv Detail & Related papers (2024-06-24T17:58:13Z) - MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design [4.819734936375677]
Metal-organic frameworks (MOFs) are of immense interest in applications such as gas storage and carbon capture.
We propose MOFDiff: a coarse-grained (CG) diffusion model that generates CG MOF structures.
We evaluate our model's capability to generate valid and novel MOF structures and its effectiveness in designing outstanding MOF materials.
arXiv Detail & Related papers (2023-10-16T18:00:15Z) - Molecular De Novo Design through Transformer-based Reinforcement
Learning [38.803770968809225]
We introduce a method to fine-tune a Transformer-based generative model for molecular de novo design.
Our proposed method exhibits superior performance in generating compounds predicted to be active against various biological targets.
Our approach can be used for scaffold hopping, library expansion starting from a single molecule, and generating compounds with high predicted activity against biological targets.
arXiv Detail & Related papers (2023-10-09T02:51:01Z) - MOFormer: Self-Supervised Transformer model for Metal-Organic Framework
Property Prediction [7.367477168940467]
Metal-Organic Frameworks (MOFs) are materials with a high degree of porosity that can be used for applications in energy storage, water desalination, gas storage, and gas separation.
Finding the optimal MOFs for specific applications requires an efficient and accurate search over an enormous number of potential candidates.
We propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs.
arXiv Detail & Related papers (2022-10-25T17:29:42Z) - How to See Hidden Patterns in Metamaterials with Interpretable Machine
Learning [82.67551367327634]
We develop a new interpretable, multi-resolution machine learning framework for finding patterns in the unit-cells of materials.
Specifically, we propose two new interpretable representations of metamaterials, called shape-frequency features and unit-cell templates.
arXiv Detail & Related papers (2021-11-10T21:19:02Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics [21.95240820041655]
Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
arXiv Detail & Related papers (2020-10-18T13:33:02Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.