ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
- URL: http://arxiv.org/abs/2507.21990v2
- Date: Wed, 30 Jul 2025 07:23:58 GMT
- Title: ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
- Authors: Zihan Zhao, Bo Chen, Ziping Wan, Lu Chen, Xuanze Lin, Shiyang Yu, Situo Zhang, Da Ma, Zichen Zhu, Danyang Zhang, Huayang Wang, Zhongyang Dai, Liyang Wen, Xin Chen, Kai Yu,
- Abstract summary: This work focuses on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R.<n>We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry.<n> Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs.
- Score: 14.6026550444088
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry. Then, we propose a mix-sourced distillation strategy that integrates expert-curated knowledge with general-domain reasoning skills, followed by domain-specific reinforcement learning to enhance chemical reasoning. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs. Further case studies illustrate how explicit reasoning chains significantly improve the reliability, transparency, and practical utility of the model in real-world human-AI collaboration scenarios.
Related papers
- MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs [30.030008221150407]
MolReasoner is a two-stage framework designed to transition Large Language Models from memorization towards chemical reasoning.<n>First, we propose Mol-SFT, which initializes the model's reasoning abilities via synthetic Chain-of-Thought(CoT) samples generated by GPT-4o and verified for chemical accuracy.<n>Subsequently, Mol-RL applies reinforcement learning with specialized reward functions designed explicitly to align chemical structures with linguistic descriptions.
arXiv Detail & Related papers (2025-08-04T05:10:11Z) - Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery [0.0]
Large Language Models often generate scientifically plausible but factually invalid information.<n>This paper presents a systematic methodology to bridge this gap by developing a specialized scientific assistant.
arXiv Detail & Related papers (2025-07-09T23:05:23Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation [21.30938446415292]
Chemistry problems typically involve long and complex reasoning steps, which contain specific terminology.<n>ChemAU identifies gaps in chemistry knowledge and precisely supplements chemical expertise with the specialized domain model.
arXiv Detail & Related papers (2025-06-01T18:45:49Z) - Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations [43.623140005091535]
We introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations.<n>ChemCoTBench formalizes chemical problem-solving into transparent, step-by-step reasoning.<n>We evaluate models on two high-impact tasks: Molecular Property Optimization and Chemical Reaction Prediction.
arXiv Detail & Related papers (2025-05-27T15:15:44Z) - Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidation [0.3065062372337749]
Large language models (LLMs) can serve as powerful tools enabling chemical analysis.<n>We leverage their ability to evaluate chemical strategies and guide search algorithms toward chemically meaningful solutions.<n>Our approach establishes a new paradigm for computer-aided chemistry.
arXiv Detail & Related papers (2025-03-11T15:27:17Z) - ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models [62.37850540570268]
Existing benchmarks in this domain fail to adequately meet the specific requirements of chemical research professionals.
ChemEval identifies 4 crucial progressive levels in chemistry, assessing 12 dimensions of LLMs across 42 distinct chemical tasks.
Results show that while general LLMs excel in literature understanding and instruction following, they fall short in tasks demanding advanced chemical knowledge.
arXiv Detail & Related papers (2024-09-21T02:50:43Z) - ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area [50.15254966969718]
We introduce textbfChemVLM, an open-source chemical multimodal large language model for chemical applications.<n>ChemVLM is trained on a carefully curated bilingual dataset that enhances its ability to understand both textual and visual chemical information.<n>We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
arXiv Detail & Related papers (2024-08-14T01:16:40Z) - Are large language models superhuman chemists? [4.87961182129702]
Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained.
Here, we introduce "ChemBench," an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs.
We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs, and found that the best models outperformed the best human chemists.
arXiv Detail & Related papers (2024-04-01T20:56:25Z) - ChemLLM: A Chemical Large Language Model [49.308528569982805]
Large language models (LLMs) have made impressive progress in chemistry applications.
However, the community lacks an LLM specifically designed for chemistry.
Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry.
arXiv Detail & Related papers (2024-02-10T01:11:59Z) - Structured Chemistry Reasoning with Large Language Models [70.13959639460015]
Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in chemistry.
We introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs' chemical reasoning capability.
Tests across four chemistry areas -- quantum chemistry, mechanics, physical chemistry, and kinetics -- StructChem substantially enhances GPT-4's performance, with up to 30% peak improvement.
arXiv Detail & Related papers (2023-11-16T08:20:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.