Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction
- URL: http://arxiv.org/abs/2601.09285v1
- Date: Wed, 14 Jan 2026 08:45:07 GMT
- Title: Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction
- Authors: Mianzhi Pan, JianFei Li, Peishuo Liu, Botian Wang, Yawen Ouyang, Yiming Rong, Hao Zhou, Jianbing Zhang,
- Abstract summary: Large Language Models (LLMs) have shown promise in generating crystals, but their application to MOFs is hindered by MOFs' high atomic complexity.<n>We introduce MOF-LLM, the first LLM framework specifically adapted for block-level MOF structure prediction.<n>Our approach substantially enhances the spatial reasoning capability of a Qwen-3 8B model for accurate MOF structure prediction.
- Score: 12.040331998543069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Metal-organic frameworks (MOFs) are porous crystalline materials with broad applications such as carbon capture and drug delivery, yet accurately predicting their 3D structures remains a significant challenge. While Large Language Models (LLMs) have shown promise in generating crystals, their application to MOFs is hindered by MOFs' high atomic complexity. Inspired by the success of block-wise paradigms in deep generative models, we pioneer the use of LLMs in this domain by introducing MOF-LLM, the first LLM framework specifically adapted for block-level MOF structure prediction. To effectively harness LLMs for this modular assembly task, our training paradigm integrates spatial-aware continual pre-training (CPT), structural supervised fine-tuning (SFT), and matching-driven reinforcement learning (RL). By incorporating explicit spatial priors and optimizing structural stability via Soft Adaptive Policy Optimization (SAPO), our approach substantially enhances the spatial reasoning capability of a Qwen-3 8B model for accurate MOF structure prediction. Comprehensive experiments demonstrate that MOF-LLM outperforms state-of-the-art denoising-based and LLM-based methods while exhibiting superior sampling efficiency.
Related papers
- ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns [68.61814799047956]
Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation.<n>We introduce ExpertWeaver, a training-free framework that partitions neurons according to their activation patterns and constructs shared experts and specialized routed experts with layer-adaptive configurations.
arXiv Detail & Related papers (2026-02-17T11:50:58Z) - Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference [5.052013621974765]
Large Language Models (LLMs) possess strong representation and reasoning capabilities, but their application to structure-based drug design (SBDD) is limited by insufficient understanding of protein structures and unpredictable molecular generation.<n>We propose Exploration-Augmented Latent Inference for LLMs (ELILLM), a framework that reinterprets the LLM generation process as an encoding, latent space exploration, and decoding workflow.<n>ELILLM explicitly explores portions of the design problem beyond the model's current knowledge while using a decoding module to handle familiar regions, generating chemically valid and synthetically reasonable molecules.
arXiv Detail & Related papers (2026-01-20T08:10:48Z) - Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models [48.83701310501069]
Large Language Models (LLMs) offer a transformative approach to Neural Architecture Search (NAS)<n>We formulate the search as a sequence of conditional code generation tasks, where an LLM refines architectural specifications based on performance telemetry.<n>We generate a vast corpus of valid, shape-consistent architectures via Abstract Syntax Tree (AST) mutations.<n> Experimental results on CIFAR-100 validate the efficacy of this approach, demonstrating that the model yields statistically significant improvements in accuracy.
arXiv Detail & Related papers (2026-01-13T13:00:30Z) - From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning [10.98910502098502]
We propose a two-stage approach that decomposes spatial reasoning into atomic building blocks and their composition.<n>First, we apply supervised fine-tuning on elementary spatial transformations, such as rotation, translation, and scaling, to equip the model with basic spatial physics.<n>We then freeze this physics-aware model and train lightweight LoRA adapters within the GRPO framework to learn policies that compose these building blocks for multi-step planning.
arXiv Detail & Related papers (2025-12-31T00:36:03Z) - Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe [51.26601054313749]
Recent efforts on Diffusion MoE models have primarily focused on developing more sophisticated routing mechanisms.<n>Inspired by the MoE design paradigms established in large language models (LLMs), we identify a set of crucial architectural factors for building effective Diffusion MoE models.<n>We present novel architectures that can be efficiently applied to both latent and pixel-space diffusion frameworks.
arXiv Detail & Related papers (2025-12-01T03:52:31Z) - Speed Always Wins: A Survey on Efficient Architectures for Large Language Models [51.817121227562964]
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models.<n> Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties.<n>The traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment.
arXiv Detail & Related papers (2025-08-13T14:13:46Z) - MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models [5.417632175667162]
Metal-Organic Frameworks (MOFs) with application-specific properties remain a central challenge in materials chemistry.<n>We present a reinforcement learning-enhanced, transformer-based framework for the de novo design of MOFs.<n>By integrating property feedback into sequence generation, our method drives the model toward synthesizable, topologically valid MOFs.
arXiv Detail & Related papers (2025-05-30T20:09:11Z) - Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [48.15777554876988]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z) - Elucidating the Design Space of Multimodal Protein Language Models [69.3650883370033]
Multimodal protein language models (PLMs) integrate sequence and token-based structural information.<n>This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations.<n>Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
arXiv Detail & Related papers (2025-04-15T17:59:43Z) - X2-DFD: A framework for eXplainable and eXtendable Deepfake Detection [55.77552681618732]
X2-DFD is an eXplainable and eXtendable framework based on multimodal large-language models (MLLMs) for deepfake detection.<n>The first stage, Model Feature Assessment, systematically evaluates the detectability of forgery-related features for the MLLM.<n>The second stage, Explainable dataset Construction, consists of two key modules: Strong Feature Strengthening and Weak Feature Supplementing.<n>The third stage, Fine-tuning and Inference, involves fine-tuning the MLLM on the constructed dataset and deploying it for final detection and explanation.
arXiv Detail & Related papers (2024-10-08T15:28:33Z) - MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks [42.61784133509237]
Metal-organic frameworks (MOFs) are a class of crystalline materials with promising applications in many areas such as carbon capture and drug delivery.<n>Existing approaches, including ab initio calculations and even deep generative models, struggle with the complexity of MOF structures due to the large number of atoms in the unit cells.<n>We introduce MOFFlow, the first deep generative model tailored for MOF structure prediction.
arXiv Detail & Related papers (2024-10-07T13:51:58Z) - Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - MOFormer: Self-Supervised Transformer model for Metal-Organic Framework
Property Prediction [7.367477168940467]
Metal-Organic Frameworks (MOFs) are materials with a high degree of porosity that can be used for applications in energy storage, water desalination, gas storage, and gas separation.
Finding the optimal MOFs for specific applications requires an efficient and accurate search over an enormous number of potential candidates.
We propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs.
arXiv Detail & Related papers (2022-10-25T17:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.