MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
- URL: http://arxiv.org/abs/2601.02075v3
- Date: Wed, 07 Jan 2026 10:06:36 GMT
- Title: MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
- Authors: Zhuofan Shi, Hubao A, Yufei Shao, Dongliang Huang, Hongxu An, Chunxiao Xin, Haiyang Shen, Zhenyu Wang, Yunshan Na, Gang Huang, Xiang Jing,
- Abstract summary: We present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the Molecular dynamics domain.<n>We adopt a three stage post-training strategy--continued pre-training, supervised fine-tuning, and reinforcement learning--to train two domain-adapted models, MD-Instruct and MD-Code.<n>We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction.
- Score: 8.68945222655668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2
Related papers
- Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback [51.22403664895878]
Agent2World is a tool-augmented multi-agent framework that achieves strong inference-time world-model generation.<n>It also serves as a data engine for supervised fine-tuning, by grounding generation in multi-agent feedback.
arXiv Detail & Related papers (2025-12-26T18:54:14Z) - ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges [5.886200278450183]
We introduce ReX-MLE, a benchmark of 20 challenges derived from high-impact medical imaging competitions.<n>Unlike prior benchmarks, ReX-MLE evaluates full end-to-end, requiring agents to independently manage data preprocessing, model training, and submission.<n>We observe a severe performance gap: most submissions rank in the 0th percentile compared to human experts.
arXiv Detail & Related papers (2025-12-19T17:44:40Z) - Automating High Energy Physics Data Analysis with LLM-Powered Agents [6.8676809101926075]
We present a proof-of-principle study demonstrating the use of large language model (LLM) agents to automate a representative high energy physics (HEP) analysis.<n>Using the Higgs boson diphoton cross-section measurement as a case study with ATLAS Open Data, we design a hybrid system that combines an LLM-based supervisor-coder agent with the Snakemake workflow manager.<n>In this architecture, the workflow manager enforces determinism, while the agent autonomously generates, executes, and iteratively corrects analysis code in response to user instructions.
arXiv Detail & Related papers (2025-12-08T18:13:13Z) - From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents [3.344730946122235]
We propose an innovative pipeline utilizing Large Language Model (LLM) agents to automate data-driven modeling and analysis.<n>We evaluate two LLM-agent frameworks: a multi-agent system featuring specialized collaborative agents, and a single-agent system based on the Reasoning and Acting (ReAct) paradigm.
arXiv Detail & Related papers (2025-10-01T19:28:35Z) - A Survey on Code Generation with LLM-based Agents [61.474191493322415]
Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm.<n>LLMs are characterized by three core features.<n>This paper presents a systematic survey of the field of LLM-based code generation agents.
arXiv Detail & Related papers (2025-07-31T18:17:36Z) - R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science [70.1638335489284]
High-level machine learning engineering tasks remain labor-intensive and iterative.<n>We introduce R&D-Agent, a comprehensive, decoupled, and framework that formalizes the machine learning process.<n>R&D-Agent defines the MLE into two phases and six components, turning agent design for MLE into a principled, testable process.
arXiv Detail & Related papers (2025-05-20T06:07:00Z) - MDCure: A Scalable Pipeline for Multi-Document Instruction-Following [40.201087646516335]
We introduce MDCure, a scalable and effective instruction data generation framework for multi-document processing.<n> MDCure generates high-quality synthetic MD instruction data over sets of articles via targeted prompts.<n> MDCure consistently improves performance over pre-trained baselines and base models by up to 75.1%.
arXiv Detail & Related papers (2024-10-30T21:08:07Z) - Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [51.20656279478878]
MATRIX is a multi-agent simulator that automatically generates diverse text-based scenarios.<n>We introduce MATRIX-Gen for controllable and highly realistic data synthesis.<n>On AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model.
arXiv Detail & Related papers (2024-10-18T08:01:39Z) - DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [56.887047551101574]
We present DS-Agent, a novel framework that harnesses large language models (LLMs) agent and case-based reasoning (CBR)
In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle.
In the deployment stage, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm, significantly reducing the demand on foundational capabilities of LLMs.
arXiv Detail & Related papers (2024-02-27T12:26:07Z) - Forces are not Enough: Benchmark and Critical Evaluation for Machine
Learning Force Fields with Molecular Simulations [5.138982355658199]
Molecular dynamics (MD) simulation techniques are widely used for various natural science applications.
We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics.
arXiv Detail & Related papers (2022-10-13T17:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.