AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library
- URL: http://arxiv.org/abs/2510.18428v1
- Date: Tue, 21 Oct 2025 09:03:26 GMT
- Title: AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library
- Authors: Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Hai Wang, Cathy Wu, Jinhua Zhao,
- Abstract summary: AlphaOPT is a self-improving experience library for optimization modeling.<n>It learns efficiently from limited demonstrations without rationales.<n>It expands continually without costly retraining by updating the library rather than model weights.
- Score: 47.82769337589924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.
Related papers
- MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance [18.215893951726166]
Large language models (LLMs) can provide subgoal decompositions, plausible trajectories, and abstract priors that facilitate early learning.<n>We propose MIRA (Memory-Integrated Reinforcement Learning Agent), which incorporates a structured, evolving memory graph to guide early training.
arXiv Detail & Related papers (2026-02-20T01:43:30Z) - SciML Agents: Write the Solver, Not the Solution [69.5021018644143]
We introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks.<n>We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants.<n>Preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.
arXiv Detail & Related papers (2025-09-12T02:53:57Z) - AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science [5.064778712920176]
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation.<n>We present $itAIRepr, an $itA$nalyst - $itI$nspector framework for automatically evaluating and improving the $itRepr$oducibility of LLM-generated data analysis.
arXiv Detail & Related papers (2025-02-23T01:15:50Z) - Agents Are All You Need for LLM Unlearning [9.934258340998047]
textttALU is a multi-agent, retrain-free, model-agnostic approach to LLM unlearning.<n>textttALU consistently stands out as the most robust inference-time LLM unlearning framework.<n>textttALU is assessed on up to 1000 unlearning targets, exceeding the evaluation scope of all previously proposed LLM unlearning methods.
arXiv Detail & Related papers (2025-02-01T11:45:44Z) - Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs.<n>LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data.<n>Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [76.59316249991657]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.<n>While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.<n>We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models [11.453585039783901]
LEAF: Learning and Evaluation Augmented by Fact-Checking, is a novel approach designed to enhance the factual reliability of large language models (LLMs)
The first strategy, Fact-Check-Then-RAG, improves Retrieval-Augmented Generation (RAG) by incorporating fact-checking results to guide the retrieval process without updating model parameters.
The second strategy, Learning from Fact-Checks via Self-Training, involves supervised fine-tuning (SFT) on fact-checked responses or applying Simple Preference Optimization (SimPO) with fact-checking as a ranking mechanism.
arXiv Detail & Related papers (2024-10-31T00:18:05Z) - Divide-Verify-Refine: Can LLMs Self-Align with Complex Instructions? [33.18076221854853]
We propose a framework to divide complex instructions into single constraints and prepare appropriate tools.<n>We then verify responses using tools that provide rigorous check and textual guidance.<n>To maximize refinement effectiveness, we propose dynamic few-shot prompting, where a refinement repository collects successful refinements.
arXiv Detail & Related papers (2024-10-16T04:01:55Z) - Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models [54.14602121129874]
We introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data.
AutoIF transforms the validation of instruction-following data quality into code verification.
arXiv Detail & Related papers (2024-06-19T13:29:53Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.