MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
- URL: http://arxiv.org/abs/2505.14148v1
- Date: Tue, 20 May 2025 09:55:31 GMT
- Title: MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
- Authors: Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu,
- Abstract summary: We formalize the task of Large Language Models (LLMs)-powered real-world mathematical modeling.<n>We propose MM-Agent, an expert-inspired framework that decomposes modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation.<n> MM-Agent significantly outperforms baseline agents, achieving an 11.88% improvement over human expert solutions.
- Score: 11.81434494801394
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM), spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expert-inspired framework that decomposes mathematical modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11.88\% improvement over human expert solutions while requiring only 15 minutes and \$0.88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (\textbf{top 2.0\% among 27,456 teams}) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot. Our code is available at https://github.com/usail-hkust/LLM-MM-Agent
Related papers
- ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges [72.19809898215857]
We introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains.<n>These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports.<n>We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured, creative solutions, and generates well-grounded, creative solutions.
arXiv Detail & Related papers (2025-05-21T03:33:23Z) - MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection [53.325457460187046]
We introduce MathAgent, a novel Mixture-of-Math-Agent framework designed specifically to address these challenges.<n>MathAgent decomposes error detection into three phases, each handled by a specialized agent.<n>We evaluate MathAgent on real-world educational data, demonstrating approximately 5% higher accuracy in error step identification.
arXiv Detail & Related papers (2025-03-23T16:25:08Z) - MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task [49.355810887265925]
We introduce MathFimer, a novel framework for mathematical reasoning step expansion.<n>We develop a specialized model, MathFimer-7B, on our carefully curated NuminaMath-FIM dataset.<n>We then apply these models to enhance existing mathematical reasoning datasets by inserting detailed intermediate steps into their solution chains.
arXiv Detail & Related papers (2025-02-17T11:22:24Z) - ProcessBench: Identifying Process Errors in Mathematical Reasoning [62.80402845414901]
We introduce ProcessBench for measuring the ability to identify erroneous steps in mathematical reasoning.<n>ProcessBench consists of 3,400 test cases, primarily focused on competition- and Olympiad-level math problems.<n>We conduct extensive evaluation on ProcessBench, involving two types of models: process reward models (PRMs) and critic models.
arXiv Detail & Related papers (2024-12-09T15:11:40Z) - LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages [14.04286044600141]
Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks.<n>But their proficiency in mathematical reasoning remains a key challenge.<n>We propose a process-oriented framework to evaluate LLMs' ability to construct mathematical models.
arXiv Detail & Related papers (2024-05-21T18:29:54Z) - Modeling Complex Mathematical Reasoning via Large Language Model based
MathAgent [15.81048994298046]
Large language models (LLMs) face challenges in solving complex mathematical problems.
We propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework.
Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents.
arXiv Detail & Related papers (2023-12-14T13:33:50Z) - JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for
Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving.
Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z) - UniGeo: Unifying Geometry Logical Reasoning via Reformulating
Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks.
We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems.
We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.