ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
- URL: http://arxiv.org/abs/2505.15068v1
- Date: Wed, 21 May 2025 03:33:23 GMT
- Title: ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
- Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji,
- Abstract summary: We introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains.<n>These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports.<n>We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured, creative solutions, and generates well-grounded, creative solutions.
- Score: 72.19809898215857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisciplinary reasoning and integration of computational tools. To address this gap, we introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains, ranging from urban traffic optimization to ecosystem resource planning. These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports. ModelingBench also supports multiple valid solutions, capturing the ambiguity and creativity of practical modeling. We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured workflows, and enables iterative self-refinement to generate well-grounded, creative solutions. To evaluate outputs, we further propose ModelingJudge, an expert-in-the-loop system leveraging LLMs as domain-specialized judges assessing solutions from multiple expert perspectives. Empirical results show that ModelingAgent substantially outperforms strong baselines and often produces solutions indistinguishable from those of human experts. Together, our work provides a comprehensive framework for evaluating and advancing real-world problem-solving in open-ended, interdisciplinary modeling challenges.
Related papers
- Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice [18.040849771712093]
Large language models (LLMs) have exhibited expert-level capabilities across various domains.<n>However, their abilities to solve problems in Operations Research (OR) remain underexplored.
arXiv Detail & Related papers (2025-06-30T14:54:15Z) - MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation [64.85885900375483]
MEXA is a training-free framework that performs modality- and task-aware aggregation of expert models.<n>We evaluate our approach on diverse multimodal benchmarks, including Video Reasoning, Audio Reasoning, 3D Understanding, and Medical QA.
arXiv Detail & Related papers (2025-06-20T16:14:13Z) - ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [53.736407871322314]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Large Language Models for Water Distribution Systems Modeling and Decision-Making [5.962279205972996]
The design, operations, and management of water distribution systems (WDS) involve complex mathematical models.<n>The recent advancements in Large Language Models (LLMs) open doors for a new stage in human-model interaction.<n>This study proposes a framework of plain language interactions with hydraulic and water quality models based on LLM-EPANET architecture.
arXiv Detail & Related papers (2025-03-20T14:39:11Z) - On the Utility of Domain Modeling Assistance with Large Language Models [2.874893537471256]
This paper presents a study to evaluate the usefulness of a novel approach utilizing large language models (LLMs) and few-shot prompt learning to assist in domain modeling.
The aim of this approach is to overcome the need for extensive training of AI-based completion models on scarce domain-specific datasets.
arXiv Detail & Related papers (2024-10-16T13:55:34Z) - Generative AI Agents with Large Language Model for Satellite Networks via a Mixture of Experts Transmission [74.10928850232717]
This paper develops generative artificial intelligence (AI) agents for model formulation and then applies a mixture of experts (MoE) to design transmission strategies.
Specifically, we leverage large language models (LLMs) to build an interactive modeling paradigm.
We propose an MoE-proximal policy optimization (PPO) approach to solve the formulated problem.
arXiv Detail & Related papers (2024-04-14T03:44:54Z) - Solution-oriented Agent-based Models Generation with Verifier-assisted
Iterative In-context Learning [10.67134969207797]
Agent-based models (ABMs) stand as an essential paradigm for proposing and validating hypothetical solutions or policies.
Large language models (LLMs) encapsulating cross-domain knowledge and programming proficiency could potentially alleviate the difficulty of this process.
We present SAGE, a general solution-oriented ABM generation framework designed for automatic modeling and generating solutions for targeted problems.
arXiv Detail & Related papers (2024-02-04T07:59:06Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - GLUECons: A Generic Benchmark for Learning Under Constraints [102.78051169725455]
In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision.
We model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints.
arXiv Detail & Related papers (2023-02-16T16:45:36Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Bayesian Stress Testing of Models in a Classification Hierarchy [0.0]
Building a machine learning solution in real-life applications often involves the decomposition of the problem into multiple models of various complexity.
We propose a Bayesian framework to model the interaction amongst models in such a hierarchy.
We show that the framework can facilitate stress testing of the overall solution, giving more confidence in its expected performance prior to active deployment.
arXiv Detail & Related papers (2020-05-25T18:22:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.