Related papers: Gala: Global LLM Agents for Text-to-Model Translation

Gala: Global LLM Agents for Text-to-Model Translation

URL: http://arxiv.org/abs/2509.08970v2
Date: Thu, 02 Oct 2025 19:55:18 GMT
Title: Gala: Global LLM Agents for Text-to-Model Translation
Authors: Junyang Cai, Serdar Kadioglu, Bistra Dilkina,
Abstract summary: We introduce Gala, a framework that addresses this challenge with a global agentic approach.<n>Multiple specialized large language model (LLM) agents decompose the modeling task by global constraint type.<n>By dividing the problem into smaller, well-defined sub-tasks, each LLM handles a simpler reasoning challenge.
Score: 12.20235137210144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural language descriptions of optimization or satisfaction problems are challenging to translate into correct MiniZinc models, as this process demands both logical reasoning and constraint programming expertise. We introduce Gala, a framework that addresses this challenge with a global agentic approach: multiple specialized large language model (LLM) agents decompose the modeling task by global constraint type. Each agent is dedicated to detecting and generating code for a specific class of global constraint, while a final assembler agent integrates these constraint snippets into a complete MiniZinc model. By dividing the problem into smaller, well-defined sub-tasks, each LLM handles a simpler reasoning challenge, potentially reducing overall complexity. We conduct initial experiments with several LLMs and show better performance against baselines such as one-shot prompting and chain-of-thought prompting. Finally, we outline a comprehensive roadmap for future work, highlighting potential enhancements and directions for improvement.

Related papers

Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization [28.52469449694436]
Large language models (LLMs) have shown strong performance in math and logic reasoning.<n>But their ability to handle systematic optimization (CO) remains underexplored.<n>We introduce NLCO, a benchmark that evaluates LLMs on end-to-end CO reasoning.
arXiv Detail & Related papers (2026-02-02T14:55:48Z)
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling [13.57588221678224]
Large Language Models (LLMs) have demonstrated impressive progress in optimization modeling.<n>The boundaries of their capabilities in automated formulation and problem solving remain poorly understood.<n>We propose OPT-ENGINE, a benchmark framework designed to evaluate LLMs on optimization modeling with controllable and scalable difficulty levels.
arXiv Detail & Related papers (2026-01-09T09:22:33Z)
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models [65.48902212293903]
We present the Extremely Complex Instruction Following Benchmark (EIFBENCH) for evaluating large language models (LLMs)<n>EIFBENCH includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently.<n>We also propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow.
arXiv Detail & Related papers (2025-06-10T02:39:55Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback [52.763620660061115]
ONI is a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function.<n>We explore a range of algorithmic choices for reward modeling with varying complexity.<n>Our approach achieves state-of-the-art performance across a range of challenging tasks from the NetHack Learning Environment.
arXiv Detail & Related papers (2024-10-30T13:52:43Z)
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective [6.911107705494142]
A common practice in large language model (LLM) usage is to sample a solution for the entire task within the model's context window.<n>Previous works have shown that subtask decomposition within the model's context is beneficial for solving such tasks.
arXiv Detail & Related papers (2024-09-26T16:34:35Z)
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks [39.27648679819897]
Chain-of-Agents (CoA) is a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context.
arXiv Detail & Related papers (2024-06-04T23:36:08Z)
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models [79.62191017182518]
FollowBench is a benchmark for Fine-grained Constraints Following Benchmark for Large Language Models. We introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level. By evaluating 13 popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.
arXiv Detail & Related papers (2023-10-31T12:32:38Z)
Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions. We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z)
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world. Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks. PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.