An Agent-Based Framework for the Automatic Validation of Mathematical Optimization Models
- URL: http://arxiv.org/abs/2511.16383v1
- Date: Thu, 20 Nov 2025 14:03:07 GMT
- Title: An Agent-Based Framework for the Automatic Validation of Mathematical Optimization Models
- Authors: Alexander Zadorojniy, Segev Wasserkrug, Eitan Farchi,
- Abstract summary: We propose a novel agent-based method for automatic validation of optimization models.<n>We show, through experiments, the high quality of validation provided by this agent ensemble in terms of the well-known software testing measure called mutation coverage.
- Score: 46.028340941489006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, using Large Language Models (LLMs) to generate optimization models from natural language descriptions has became increasingly popular. However, a major open question is how to validate that the generated models are correct and satisfy the requirements defined in the natural language description. In this work, we propose a novel agent-based method for automatic validation of optimization models that builds upon and extends methods from software testing to address optimization modeling . This method consists of several agents that initially generate a problem-level testing API, then generate tests utilizing this API, and, lastly, generate mutations specific to the optimization model (a well-known software testing technique assessing the fault detection power of the test suite). In this work, we detail this validation framework and show, through experiments, the high quality of validation provided by this agent ensemble in terms of the well-known software testing measure called mutation coverage.
Related papers
- Self-Improving LLM Agents at Test-Time [49.9396634315896]
One paradigm of language model (LM) fine-tuning relies on creating large training datasets.<n>In practice, gathering large sets of data is inefficient, and training on them is prohibitively expensive.<n>We study two variants of this approach: Test-Time Self-Improvement (TT-SI) and Test-Time Distillation (TT-D)
arXiv Detail & Related papers (2025-10-09T06:37:35Z) - Toward a Trustworthy Optimization Modeling Agent via Verifiable Synthetic Data Generation [11.988926173584154]
We present a framework for training trustworthy large language model (LLM) agents via a synthetic data generation pipeline.<n> OptiTrust is a modular LLM agent that performs multi-language translation from natural language to solver-ready code.<n>Our agent achieves state-of-the-art performance on standard benchmarks.
arXiv Detail & Related papers (2025-08-05T05:54:20Z) - Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling [3.253908111652627]
Large Language Models (LLMs) often struggle to generate formally correct and usable models against hallucinations.<n>We present a novel framework that significantly improves the authenticity of LLMs for optimization modeling using Reinforcement Learning with Verifiable Reward.
arXiv Detail & Related papers (2025-05-17T02:32:03Z) - Formal Analysis of the Contract Automata Runtime Environment with Uppaal: Modelling, Verification and Testing [0.11844977816228043]
A distributed runtime application called contract automata environment (CARE) has been introduced to realise service applications specified using a dialect of finite-state automata.<n>We detail the formal modelling, verification and testing of CARE.
arXiv Detail & Related papers (2025-01-22T15:03:25Z) - Towards an Automatic Optimisation Model Generator Assisted with
Generative Pre-trained Transformer [0.0]
This article presents a framework for generating optimisation models using a pre-trained generative transformer.
The framework involves specifying the features that the optimisation model should have and using a language model to generate an initial version of the model.
arXiv Detail & Related papers (2023-05-09T23:51:14Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Exploring and Evaluating Personalized Models for Code Generation [9.25440316608194]
We evaluate transformer model fine-tuning for personalization.
We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned.
We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.
arXiv Detail & Related papers (2022-08-29T23:28:46Z) - On the Limits of Evaluating Embodied Agent Model Generalization Using
Validation Sets [101.28658250723804]
This paper experiments with augmenting a transformer model with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action.
We observe that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED.
We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits.
arXiv Detail & Related papers (2022-05-18T23:52:21Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.