Related papers: Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation

Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation

URL: http://arxiv.org/abs/2506.17637v1
Date: Sat, 21 Jun 2025 08:42:27 GMT
Title: Step-Opt: Boosting Optimization Modeling in LLMs through Iterative Data Synthesis and Structured Validation
Authors: Yang Wu, Yifan Zhang, Yurong Wu, Yuran Wang, Junkai Zhang, Jian Cheng,
Abstract summary: Step-Opt-Instruct is a framework that augments existing datasets and generates high-quality fine-tuning data tailored to optimization modeling.<n>We fine-tune open-source LLMs, including LLaMA-3-8B and Mistral-7B, to develop Step-Opt-a model that achieves state-of-the-art performance on benchmarks such as NL4OPT, MAMO, and IndustryOR.
Score: 18.18239596347168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have revolutionized various domains but encounter substantial challenges in tackling optimization modeling tasks for Operations Research (OR), particularly when dealing with complex problem. In this work, we propose Step-Opt-Instruct, a framework that augments existing datasets and generates high-quality fine-tuning data tailored to optimization modeling. Step-Opt-Instruct employs iterative problem generation to systematically increase problem complexity and stepwise validation to rigorously verify data, preventing error propagation and ensuring the quality of the generated dataset. Leveraging this framework, we fine-tune open-source LLMs, including LLaMA-3-8B and Mistral-7B, to develop Step-Opt--a model that achieves state-of-the-art performance on benchmarks such as NL4OPT, MAMO, and IndustryOR. Extensive experiments demonstrate the superior performance of Step-Opt, especially in addressing complex OR tasks, with a notable 17.01\% improvement in micro average accuracy on difficult problems. These findings highlight the effectiveness of combining structured validation with gradual problem refinement to advance the automation of decision-making processes using LLMs.The code and dataset are available at https://github.com/samwu-learn/Step.

Related papers

PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving [66.42260489147617]
We introduce PLAN-TUNING, a framework that distills synthetic task decompositions from large-scale language models.<n>Plan-TUNING fine-tunes smaller models via supervised and reinforcement-learning objectives to improve complex reasoning.<n>Our analysis demonstrates how planning trajectories improves complex reasoning capabilities.
arXiv Detail & Related papers (2025-07-10T07:30:44Z)
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems [19.586884180343038]
OPT-BENCH is a benchmark designed to evaluate Large Language Models (LLMs) on large-scale search space optimization problems.<n> OPT-Agent emulates human reasoning when tackling complex problems by generating, validating, and iteratively improving solutions through historical feedback.
arXiv Detail & Related papers (2025-06-12T14:46:41Z)
Synthesis by Design: Controlled Data Generation via Structural Guidance [7.938713951512933]
We propose to extract structural information with generated problem-solving code from mathematical reasoning.<n>Our approach produces 39K problems with labeled intermediate steps and a 6.1K-problem benchmark of higher difficulty.<n>Results on our benchmark show that model performance declines as reasoning length increases.
arXiv Detail & Related papers (2025-06-09T11:38:23Z)
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.<n>Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.<n>We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z)
OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling [9.617742955894247]
Lack of high-quality optimization modeling datasets hampers large language models.<n>We propose a scalable framework for synthesizing a high-quality dataset, named OptMATH.<n>We demonstrate that models of various sizes trained on OptMATH achieve superior results on multiple modeling benchmarks.
arXiv Detail & Related papers (2025-02-16T12:38:37Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.<n>Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs) Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws. Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z)
Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately [2.1715455600756646]
Large Language Models (LLMs) generate responses to questions. Their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions. To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models.
arXiv Detail & Related papers (2024-01-27T00:18:07Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.