Related papers: Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

URL: http://arxiv.org/abs/2505.21318v2
Date: Mon, 16 Jun 2025 07:52:25 GMT
Title: Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Authors: Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuan, Yonghong Tian, Yu Li,
Abstract summary: We introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations.<n>ChemCoTBench formalizes chemical problem-solving into transparent, step-by-step reasoning.<n>We evaluate models on two high-impact tasks: Molecular Property Optimization and Chemical Reaction Prediction.
Score: 43.623140005091535
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While large language models (LLMs) with Chain-of-Thought (CoT) reasoning excel in mathematics and coding, their potential for systematic reasoning in chemistry, a domain demanding rigorous structural analysis for real-world tasks like drug design and reaction engineering, remains untapped. Current benchmarks focus on simple knowledge retrieval, neglecting step-by-step reasoning required for complex tasks such as molecular optimization and reaction prediction. To address this, we introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations, including addition, deletion, and substitution, to formalize chemical problem-solving into transparent, step-by-step workflows. By treating molecular transformations as modular "chemical operations", the framework enables slow-thinking reasoning, mirroring the logic of mathematical proofs while grounding solutions in real-world chemical constraints. We evaluate models on two high-impact tasks: Molecular Property Optimization and Chemical Reaction Prediction. These tasks mirror real-world challenges while providing structured evaluability. By providing annotated datasets, a reasoning taxonomy, and baseline evaluations, ChemCoTBench bridges the gap between abstract reasoning methods and practical chemical discovery, establishing a foundation for advancing LLMs as tools for AI-driven scientific innovation.

Related papers

QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry [12.18966912295507]
QCBench is a benchmark comprising 350 computational chemistry problems across 7 chemistry subfields.<n>Each problem focuses on pure calculations rooted in real-world chemical vertical fields.<n> Evaluations on 19 LLMs demonstrate a consistent performance degradation with increasing task complexity.
arXiv Detail & Related papers (2025-08-03T08:55:42Z)
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge [14.6026550444088]
This work focuses on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R.<n>We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry.<n> Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs.
arXiv Detail & Related papers (2025-07-29T16:40:49Z)
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification? [19.700175505235876]
ToxiMol is the first benchmark task for general-purpose Multimodal Large Language Models (MLLMs) focused on molecular toxicity repair.<n>We construct a standardized dataset covering 11 primary tasks and 560 representative toxic molecules spanning diverse mechanisms and granularities.
arXiv Detail & Related papers (2025-06-12T17:25:53Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
Chemical reasoning in LLMs unlocks steerable synthesis planning and reaction mechanism elucidation [0.3065062372337749]
Large language models (LLMs) can serve as powerful chemical reasoning engines when integrated with traditional search algorithms.<n>We demonstrate this paradigm through two fundamental challenges: strategy-aware retrosynthetic planning and mechanism elucidation.<n>Our approach establishes a new paradigm for computer-aided chemistry that combines the strategic understanding of LLMs with the precision of traditional chemical tools.
arXiv Detail & Related papers (2025-03-11T15:27:17Z)
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning [64.2106664137118]
ChemAgent is a novel framework designed to improve the performance of large language models (LLMs)<n>It is developed by decomposing chemical tasks into sub-tasks and compiling these sub-tasks into a structured collection that can be referenced for future queries.<n>When presented with a new problem, ChemAgent retrieves and refines pertinent information from the library, which we call memory.
arXiv Detail & Related papers (2025-01-11T17:10:30Z)
GraphXForm: Graph transformer for computer-aided molecular design [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds.<n>We evaluate it on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches.
arXiv Detail & Related papers (2024-11-03T19:45:15Z)
ChemLLM: A Chemical Large Language Model [49.308528569982805]
Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry.
arXiv Detail & Related papers (2024-02-10T01:11:59Z)
Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z)
Advancing Drug Discovery with Enhanced Chemical Understanding via Asymmetric Contrastive Multimodal Learning [23.85388398199658]
We introduce Asymmetric Contrastive Multimodal Learning (ACML) to enhance molecular understanding and accelerate advancements in drug discovery.<n>ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations.<n>We demonstrate the effectiveness of this framework through large-scale cross-modality retrieval and isomer discrimination tasks.
arXiv Detail & Related papers (2023-11-11T01:58:45Z)
ChemAlgebra: Algebraic Reasoning on Chemical Reactions [16.93639996082923]
It is unclear whether deep learning models have the ability to tackle reasoning tasks. ChemAlgebra is a benchmark for measuring the reasoning capabilities of deep learning models.
arXiv Detail & Related papers (2022-10-05T08:34:44Z)
Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.