Related papers: Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving

Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving

URL: http://arxiv.org/abs/2509.23470v1
Date: Sat, 27 Sep 2025 19:47:15 GMT
Title: Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving
Authors: Rui Ai, Hugo De Oliveira Barbalho, Sirui Li, Alexei Robsky, David Simchi-Levi, Ishai Menache,
Abstract summary: A common challenge in real-time operations is deciding whether to re-solve an optimization problem or continue using an existing solution.<n>We propose a framework called Proximal Policy Optimization with Change Point Detection.
Score: 18.62245790631018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A common challenge in real-time operations is deciding whether to re-solve an optimization problem or continue using an existing solution. While modern data platforms may collect information at high frequencies, many real-time operations require repeatedly solving computationally intensive optimization problems formulated as Mixed-Integer Linear Programs (MILPs). Determining when to re-solve is, therefore, an economically important question. This problem poses several challenges: 1) How to characterize solution optimality and solving cost; 2) How to detect environmental changes and select beneficial samples for solving the MILP; 3) Given the large time horizon and non-MDP structure, vanilla reinforcement learning (RL) methods are not directly applicable and tend to suffer from value function explosion. Existing literature largely focuses on heuristics, low-data settings, and smooth objectives, with little focus on common NP-hard MILPs. We propose a framework called Proximal Policy Optimization with Change Point Detection (POC), which systematically offers a solution for balancing performance and cost when deciding appropriate re-solving times. Theoretically, we establish the relationship between the number of re-solves and the re-solving cost. To test our framework, we assemble eight synthetic and real-world datasets, and show that POC consistently outperforms existing baselines by 2%-17%. As a side benefit, our work fills the gap in the literature by introducing real-time MILP benchmarks and evaluation criteria.

Related papers

Applying a Random-Key Optimizer on Mixed Integer Programs [0.36700088931938835]
Mixed-Integer Programs (MIPs) are NP-hard optimization models that arise in a broad range of decision-making applications.<n>This paper explores the use of the Random-Key integer (RKO) framework as a flexible, metaheuristic alternative for computing high-quality solutions to MIPs.
arXiv Detail & Related papers (2026-02-25T18:20:03Z)
SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search [58.116954449750544]
We introduce a training-free framework that leverages test-time scaling to solve diverse optimization problems.<n>Rather than solving directly, it generates mathematical formulations and translates them into solver-ready code, guided by a novel Monte Carlo Tree Search strategy.
arXiv Detail & Related papers (2025-10-19T16:21:19Z)
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z)
A Heuristic Algorithm Based on Beam Search and Iterated Local Search for the Maritime Inventory Routing Problem [0.45152963243489175]
Maritime Inventory Problem (MIRP) plays a crucial role in the integration of global maritime commerce levels.<n>MIRP plays a crucial role in the integration of global maritime commerce levels.<n>There are still no well-established methodologies capable of efficiently solving large MIRP instances or their variants.
arXiv Detail & Related papers (2025-05-17T22:40:36Z)
Autoformulation of Mathematical Optimization Models Using LLMs [50.030647274271516]
This paper approaches the problem of $textitautoformulation$: the automated creation of solver-ready optimization models from natural language problem descriptions.<n>We identify three core challenges of autoformulation: $textit(1)$ the vast, problem-dependent hypothesis space, and $textit(2)$ efficient and diverse exploration of this space under uncertainty.<n>We present a novel method leveraging $textitLarge Language Models$ with $textitMonte-Carlo Tree Search$, exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations
arXiv Detail & Related papers (2024-11-03T20:41:38Z)
Automatic MILP Solver Configuration By Learning Problem Similarities [1.1585113506994469]
Mixed Linear Programs (MILP) solvers expose numerous configuration parameters to control their internal algorithms. We aim to predict configuration parameters for unseen problem instances that yield lower-cost solutions without the time overhead of searching-and-evaluating configurations. We show that instances that have similar costs using one solver configuration also have similar costs using another solver configuration in the same runtime environment.
arXiv Detail & Related papers (2023-07-02T21:31:47Z)
Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems. We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z)
Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator) Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP. For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z)
Learning to Schedule Heuristics in Branch-and-Bound [25.79025327341732]
Real-world applications typically require finding good solutions early in the search to enable fast decision-making. We propose the first data-driven framework for schedulings in an exact MIP solver. Compared to the default settings of a state-of-the-art academic MIP solver, we are able to reduce the average primal integral by up to 49% on a class of challenging instances.
arXiv Detail & Related papers (2021-03-18T14:49:52Z)
Contrastive Losses and Solution Caching for Predict-and-Optimize [19.31153168397003]
We use a Noise Contrastive approach to motivate a family of surrogate loss functions. We address a major bottleneck of all predict-and-optimize approaches. We show that even a very slow growth rate is enough to match the quality of state-of-the-art methods.
arXiv Detail & Related papers (2020-11-10T19:09:12Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.